Extracting data from from all rows aside from those that contain NaN?

Question

Paul on 8 Oct 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/89483-extracting-data-from-from-all-rows-aside-from-those-that-contain-nan

Commented: Cedric on 8 Oct 2013

How would I go about extracting data from one column based on the criteria of the other column?

For example, in column A, I have a few NaN values, after setting up my own criteria for this column. Column B has a its own values from randn values.

I now want to calculate the average of column B, but only rows that have a corresponding value in column A. Thus, if any value in column B corresponds to a NaN value in column A, I want to exclude that value from my average.

I'm tinkering around with the find function, but no luck yet.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 8 Oct 2013

4
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/89483-extracting-data-from-from-all-rows-aside-from-those-that-contain-nan#answer_98933

Edited: Cedric on 8 Oct 2013

 select = ~isnan( A ) ;      % Vector of logicals, true at location corresponding
                             %   to non-NaN elements of A.
 values = B(select) ;        % Logical indexing of B => extract elements of B 
                             %   which correspond to non-NaN elements of A.

Then you apply whatever you want to values. Note that there are functions which can make this selection for you, e.g. NANMEAN from the Stat. toolbox.

If you need these values for a one shot operation (which means if you don't have to reuse them), you can skip intermediary variables/steps. For example, fr sum these elements with a one-liner, you can do

sum( B(~isnan(A)) )

6 Comments
Show 4 older commentsHide 4 older comments

Cedric on 8 Oct 2013

Edited: Cedric on 8 Oct 2013

Well, you are right that NANMEAN doesn't apply to your situation, my mistake. What you wrote is far from being useless; it's actually what we all do in the beginning and it's a good thing that you test these approaches as well. What you are missing is just a type of indexing called "logical indexing", which makes everything simpler and more efficient. Here is an example:

 >> A = [1, 2, NaN, 5, NaN, 4] ;
 >> isnan( A )
 ans =
     0     0     1     0     1     0
 >> class( ans )
 ans =
      logical

Here you can see that the output of ISNAN is a vector of logicals (class or type), whose elements true (1) or false (0) are the logical outcome of the test "is nan" applied to each element of A. The first thing that comes to people's mind when they get that is to find the position of NaNs in the vector, as follows:

 >> find( isnan(A) )
 ans =
     3     5

which indicates that elements 3 and 5 of A are NaNs. Now as we already evaluated isnan(A) (which was the array of logicals), you see that FIND is actually not "searching for NaN values", but simply returning a list of positions where its argument is not false/0.

Now you could use these positions for indexing, e.g.

 >> pos = find( isnan(A) )
 pos =
     3     5
 >> A(pos)                    % Just to be sure that we spot the NaNs.
 ans =
      NaN   NaN

Or use the same positions in another vector..

 >> B = [11, 12, 13, 14, 15, 16] ;
 >> B(pos)
 ans =
    13    15

But there is better/simpler to do than using positions. MATLAB can use directly a vector of logicals for indexing (and return elements where logical elements are true):

 >> lid = isnan( A )
 lid =
     0     0     1     0     1     0
 >> A(lid)
 ans =
   NaN   NaN
 >> B(lid)
 ans =
    13    15

which is the same as before, except that we spare the extra step of FIND-ing positions and to use them for indexing. Also, taking all other elements is simpler when you have a logical index than when you have positions, because you can take the negation of the vector of logicals and you don't have to compute all other positions.

 >> ~lid             % "not" applied to the logical elements of lid.
 ans =
     1     1     0     1     0     1
 >> B(~lid)
 ans =
    11    12    14    16

(actually, if you really wanted positions, you would use find(~isnan(A)), rather than trying to compute these positions)

So this is what I am doing in the second part of my answer, except that I don't create an intermediary variable lid but I use directly the output of ~isnan(A) to index B:

 values = B( ~isnan(A) ) ;
 theMean = mean( values ) ;

or, directly

theMean = mean( B(~isnan(A)) ) ;

Paul on 8 Oct 2013

Okay I'm just trying to figure out the final line provided, I just want to break it down piece by piece.

So isnan returns values in the matrices based in a true.false or 0.1 manner.

In the first line

lid = isnan( Z(:,1) ) ;

should it be ~isnan?

so in the first line, you're finding the variables in matrix Z, column 1, that are not equal to NaN(or the false value), and equaling this to lid

in the second line, values is your Z matrix in correspondence to the Lid row number, in the 2nd column

then since you have your appropriate rows in the 2nd column, you're just then finding the average

I think I got it, I'm just trying to figure out whats happening between the first and second line. I'm confused as to why after the first function, which recreates the array in true/false or 1/0s, how the second line then only finds the values of those that correspond to the true/1's rows of the column.

When you create variables with logical expressions (such as lid = isnan(Z(:,1)); and then when you nest that in another line of code (such as values = Z(lid,2)), that second line of code will only act on the desired logical expressions (in this case the 1's or where there are no NaN values)?

Thanks again for the response, this was tremendous help in learning this proces.

Cedric on 8 Oct 2013

Edited: Cedric on 8 Oct 2013

Most of what you say is correct, and you spotted a typo! :-) You are right, I should have written either

 lid     = isnan( Z(:,1) ) ;
 values  = Z(~lid,2) ;
 theMean = mean( values ) ;

or

 lid     = ~isnan( Z(:,1) ) ;
 values  = Z(lid,2) ;
 theMean = mean( values ) ;

As you explain, both are somehow the lengthy way to perform the same thing as the more compact

theMean = mean( Z(~isnan(Z(:,1)), 2) ) ;

MATLAB evaluates the most internal expression, and pipes the output in its container (don't know if this "pipe" analogy will help), and so on. When the large expression above is evaluated, MATLAB does the following roughly..

 Evaluate:         Z(:,1)                                     (innermost)
                      |
                      |  put the result in (or pipe)
                      v
 Evaluate      isnan(   )
                      |
                      |  put the result in (or pipe)
                      v
 Evaluate         ~(     )              
                      |
                      |  put the result in (or pipe)
                      v
 Evalaute        Z(      , 2)
                      |
                      |  put the result in (or pipe)
                      v
 Evaluate     mean(       )                                   (outermost)

Internally, MATLAB computes inner expression and creates intermediary arrays with results, that it passes to outer expressions. Now if you think that it makes the code clearer to make these intermediary steps by yourself and build your own intermediary arrays, you are free to do so. This is what I did when I built these examples with intermediary steps. The second reason for creating intermediary variables by yourself is when some intermediary variable could be reused elsewhere. Say, for example, that you have A with NaNs, and you want to address B, C, D, .., Z at locations where A is not NaN. You can do something like

 meanB = mean( B(~isnan(A)) ) ;
 meanC = mean( C(~isnan(A)) ) ;
 meanD = mean( D(~isnan(A)) ) ;
 ..

but it is inefficent because ~isnan(A) is recomputed each time. In such case, it is more efficient to compute it once only and reuse it:

 lid = ~isnan( A ) ;
 meanB = mean( B(lid) ) ;
 meanC = mean( C(lid) ) ;
 meanD = mean( D(lid) ) ;
 ..

Sign in to comment.

Extracting data from from all rows aside from those that contain NaN?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments
Show 4 older commentsHide 4 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Extracting data from from all rows aside from those that contain NaN?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments Show 4 older commentsHide 4 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

6 Comments
Show 4 older commentsHide 4 older comments