Extracting data from from all rows aside from those that contain NaN?

25 views (last 30 days)
How would I go about extracting data from one column based on the criteria of the other column?
For example, in column A, I have a few NaN values, after setting up my own criteria for this column. Column B has a its own values from randn values.
I now want to calculate the average of column B, but only rows that have a corresponding value in column A. Thus, if any value in column B corresponds to a NaN value in column A, I want to exclude that value from my average.
I'm tinkering around with the find function, but no luck yet.

Accepted Answer

Cedric
Cedric on 8 Oct 2013
Edited: Cedric on 8 Oct 2013
select = ~isnan( A ) ; % Vector of logicals, true at location corresponding
% to non-NaN elements of A.
values = B(select) ; % Logical indexing of B => extract elements of B
% which correspond to non-NaN elements of A.
Then you apply whatever you want to values. Note that there are functions which can make this selection for you, e.g. NANMEAN from the Stat. toolbox.
If you need these values for a one shot operation (which means if you don't have to reuse them), you can skip intermediary variables/steps. For example, fr sum these elements with a one-liner, you can do
sum( B(~isnan(A)) )
  6 Comments
Paul
Paul on 8 Oct 2013
Okay I'm just trying to figure out the final line provided, I just want to break it down piece by piece.
So isnan returns values in the matrices based in a true.false or 0.1 manner.
In the first line
lid = isnan( Z(:,1) ) ;
should it be ~isnan?
so in the first line, you're finding the variables in matrix Z, column 1, that are not equal to NaN(or the false value), and equaling this to lid
in the second line, values is your Z matrix in correspondence to the Lid row number, in the 2nd column
then since you have your appropriate rows in the 2nd column, you're just then finding the average
I think I got it, I'm just trying to figure out whats happening between the first and second line. I'm confused as to why after the first function, which recreates the array in true/false or 1/0s, how the second line then only finds the values of those that correspond to the true/1's rows of the column.
When you create variables with logical expressions (such as lid = isnan(Z(:,1)); and then when you nest that in another line of code (such as values = Z(lid,2)), that second line of code will only act on the desired logical expressions (in this case the 1's or where there are no NaN values)?
Thanks again for the response, this was tremendous help in learning this proces.
Cedric
Cedric on 8 Oct 2013
Edited: Cedric on 8 Oct 2013
Most of what you say is correct, and you spotted a typo! :-) You are right, I should have written either
lid = isnan( Z(:,1) ) ;
values = Z(~lid,2) ;
theMean = mean( values ) ;
or
lid = ~isnan( Z(:,1) ) ;
values = Z(lid,2) ;
theMean = mean( values ) ;
As you explain, both are somehow the lengthy way to perform the same thing as the more compact
theMean = mean( Z(~isnan(Z(:,1)), 2) ) ;
MATLAB evaluates the most internal expression, and pipes the output in its container (don't know if this "pipe" analogy will help), and so on. When the large expression above is evaluated, MATLAB does the following roughly..
Evaluate: Z(:,1) (innermost)
|
| put the result in (or pipe)
v
Evaluate isnan( )
|
| put the result in (or pipe)
v
Evaluate ~( )
|
| put the result in (or pipe)
v
Evalaute Z( , 2)
|
| put the result in (or pipe)
v
Evaluate mean( ) (outermost)
Internally, MATLAB computes inner expression and creates intermediary arrays with results, that it passes to outer expressions. Now if you think that it makes the code clearer to make these intermediary steps by yourself and build your own intermediary arrays, you are free to do so. This is what I did when I built these examples with intermediary steps. The second reason for creating intermediary variables by yourself is when some intermediary variable could be reused elsewhere. Say, for example, that you have A with NaNs, and you want to address B, C, D, .., Z at locations where A is not NaN. You can do something like
meanB = mean( B(~isnan(A)) ) ;
meanC = mean( C(~isnan(A)) ) ;
meanD = mean( D(~isnan(A)) ) ;
..
but it is inefficent because ~isnan(A) is recomputed each time. In such case, it is more efficient to compute it once only and reuse it:
lid = ~isnan( A ) ;
meanB = mean( B(lid) ) ;
meanC = mean( C(lid) ) ;
meanD = mean( D(lid) ) ;
..

Sign in to comment.

More Answers (0)

Categories

Find more on Operating on Diagonal Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!