Asked by Isti
on 21 Apr 2012

The simple case is like this:

2 1 4 6 2

9 4 6 1 2

5 3 2 8 3

7 2 1 9 3

7 1 8 2 4

From the matrix above, i want to insert 3 NaNs in random place. So, my code is like this:

Data = [2,1,4,6,2;9,4,6,1,2;5,3,2,8,3;7,2,1,9,3;7,1,8,2,4];

[rows,cols] = size(Data);

p = 3; %amount of NaN that will we inserted

r = randperm(25); %give the random value from range 1-25

r = r(1:3); %give 3 random number from range 1-25

i = 1;a = 1; b = 1;

while i <= 3 %generate every number in vektor r to be position where NaN is located

n = r(a,b);

b = b+1;

e = 1;

if n <= cols

Data(1,n) = NaN;

else

if n > cols

while n > cols

e = e+1;

k = n - cols;

n = k;

end

Data(e,n) = NaN;

end

end

i = i+1;

end

The output one of the output will be like this:

2 1 4 6 2

9 NaN NaN NaN 2

5 3 2 8 3

7 2 1 9 3

7 1 8 2 4

So, i want to make some constraint such as:

1. every row only can have 2 NaN

2. amount NaN in column 1 have to be less then column 2, and amount NaN in column 2 have to be less then column 3, and so on. eg. output matrix will be like this:

2 1 4 6 2

9 4 6 1 NaN

5 3 2 8 3

7 2 1 NaN 3

7 1 8 2 NaN

for matrix above we can see that:

amount NaN of column 1= 0, column 2=0, column 3=0, column 4=1, column 5= 2.

Somebody can help me to insert those my constraint into my code above? Or there willl be another solution i think.

Thanks before :')

Answer by per isakson
on 22 Apr 2012

This is an idea that I have not tested!

jj = 0;

for ii = r

[rr,cc] = ind2sub( size(Data), ii )

if sum(isnan(Data(rr,:))>=2 || sum( isnan(Data(:,cc))>=2

% do nothing

else

Data(rr,cc)=nan;

jj = jj + 1;

if jj = 3, break

end

end

end

--- EDIT ---

The function below will return a result. The constraint is "no more than two NaN in any column or row. However, that was not what you asked for.

function Data = cssm

Data = [2,1,4,6,2;9,4,6,1,2;5,3,2,8,3;7,2,1,9,3;7,1,8,2,4];

p = 3; %amount of NaN that will we inserted

row_vector = randperm(numel(Data));

jj = 0;

for ii = row_vector

[rr,cc] = ind2sub( size(Data), ii );

if sum(isnan(Data(rr,:)))>=2 || sum( isnan(Data(:,cc)))>=2

% do nothing

else

Data(rr,cc)=nan;

jj = jj + 1;

if jj == p, break

end

end

end

end

With the constraint, "amount NaN in column 1 have to be less then column 2, and amount NaN in column 2 have to be less then column 3, and so on.", there is no solution. Do you exclude columns with zero NaN from that constraint?

Thus, (according to my reading) the last column can have two or three NaN and the second last column one or zero NaN. NaN cannot not appear in the other columns.

Isti
on 22 Apr 2012

what constraint will be broken? i can't get it. i think something wrong in my perception right now.

thanks for help anyway :)

Isti
on 22 Apr 2012

ooh, i think your suggestion code isn't fulfill my second constraint :(

Isti
on 24 Apr 2012

of course not, the columns with zero NaN also included. and so when the column have zero NaN, it will in the very left column of the matrix.

btw, what's the used of ind2sub above. i can't get it yet

Sign in to comment.

Answer by Richard Brown
on 22 Apr 2012

This is another one of these problems where the simplest way to solve it is to randomly generate candidates until you find one that fits:

A = reshape(randperm(25), 5, 5);

done = false;

while ~done

idx = randperm(25, 3);

[I, J] = ind2sub([5 5], idx);

m = hist(I, unique(I));

n = hist(J, unique(J));

done = all(m <= 2) && all(diff(n) >= 0);

end

A(idx) = NaN;

It's trivial (but a little messier) to make it more general, so I'll leave you to do that if you need to.

EDIT changed code to use randperm instead of randi - only one call to the random number generator is necessary

Isti
on 28 Apr 2012

thanks for this answer. actually it works in my smal dataset. but, for my medium dataset (such 1500rows*11columns of data) and more amount of NaN to be insert, it takes very long time. and even i decided to cancel it :(

if i cut the 2nd constraint and only want to use the 1st constraint, is there any way to make it faster?

thanks before.

Sign in to comment.

Answer by Richard Brown
on 29 Apr 2012

Here's a much faster method that satisfies both of your constraints. It may be possible to vectorise the loop, but it is, in my opinion, not worth the effort.

First, generate the data

X = rand(1500, 11);

[m,n] = size(X);

nNans = 2000;

We figure out the row and column indices separately. Rows is easy, a single call to randperm does the trick

I = mod(randperm(2*m, nNans), m) + 1;

Then figure out the column positions randomly, going row by row to avoid creating duplicate entries.

J = zeros(1, nNans);

k = 1;

for i = 1:m

idx = (I == i);

J(idx) = randperm(n, nnz(idx));

end

We now need to make sure the columns are ordered correctly. So we construct a logical matrix encoding the position of the NaN entries, and reorder the columns to satisfy your column constraint.

iNan = false(m, n);

iNan(sub2ind([m n], I, J)) = true;

[~, iSorted] = sort(hist(J, 1:n));

iNan = iNan(:, iSorted);

We now have a logical array with the right properties. Last step is to overwrite the entries of X

X(iNan) = nan;

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 2 Comments

## per isakson (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/36154-how-to-replace-some-of-the-value-in-the-matrix-with-nan#comment_74987

## Isti (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/36154-how-to-replace-some-of-the-value-in-the-matrix-with-nan#comment_74996

Sign in to comment.