Relatively complex matrix indexing question

3 views (last 30 days)
Hi,
I have a reasonably complex matrix indexing question that I'm struggling to solve. I have an nxn matrix of float values. The matrix is fairly large (n~30000). The matrix is upper (or lower) triangular with the other values being NaN. Each of the rows represented has a vector associated with it, so the matrix rows has a secondary matrix associated with it that is nxk, where each row has k characteristics. Row 1 has the same characteristics as column 1, row 2 has the same characteristics as column2, etc.
I want to plot a histogram of all of the elements in the nxn matrix whose rows/columns match certain characteristics.
To illustrate, let's say I had the following 4x4 value matrix
Values = [...
NaN 4.0 3.1 2.3
NaN NaN 1.2 5.2
NaN NaN NaN 7.1
NaN NaN NaN NaN]
Each row/column has three characteristics and this characteristics matrix has the values
Charac = [...
4 2 3
1 4 5
-3 5 1
10 6 2]
I only want to get histogram elements for rows/columns that have the second characteristic greater than 1 and the third characteristic less than 4. In this case only the first, third and fourth rows would meet the criterion and the values that I would take from the matrix Values are 3.1, 2.3 and 7.1.
Is there a compact and computationally efficient way to logically index this?
Thanks in advance!
  8 Comments
Sam
Sam on 10 Jun 2014
So the columns and rows are selected by the same criteria, so if the valid indices for the rows are 1,3 and 4, the only valid (row,column) combinations are (1,3), (1,4) and (3,4). In other words, column 2 and row 2 get selected out. I will look into using sparse matrices. My understanding was that since the memory had to store the value and the indices of each element that you only save memory if the matrix has much more than 50% zeros. I would have to have a uint16 values to store the indices of the matrix (so row and column would mean an extra 4 bytes), whereas each element is only 1 byte. So I think the fraction of zeros would have to be something greater than 80% to save any memory.

Sign in to comment.

Accepted Answer

Cedric
Cedric on 10 Jun 2014
Edited: Cedric on 10 Jun 2014
You are correct about sparse matrices, and I had completely misunderstood your setup! So the lower triangular part is NaN, but the upper part has a density of 1(?) If so, there is no point in using sparse matrices. Reading your last explanation, I think that what you are looking for is along the line of what follows.
EDIT - I provide a second approach (B), which could work if you had only non-NaN elements in a vector. It is just a quick test though and it would have to be tested/profiled.
values = [NaN 4.0 3.1 2.3; ...
NaN NaN 1.2 5.2; ...
NaN NaN NaN 7.1; ...
NaN NaN NaN NaN] ;
charac = [4 2 3; ...
1 4 5; ...
-3 5 1; ...
10 6 2] ;
%%%A. Values is square num. array with NaNs.
% - Find valid row IDs.
isValid = charac(:,2) > 1 & charac(:,3) < 4 ;
validId = find( isValid ) ;
% - Build all possible row/col pairs.
validRC = nchoosek( validId, 2 ) ;
isValid = validRC(:,2) > validRC(:,1) ;
validRC = validRC(isValid,:) ;
% - Extract relevant values.
validId = sub2ind( size(values), validRC(:,1), validRC(:,2)) ;
extract = values(validId) ;
%%%B. Values is lin num array with non-NaNs only.
% - Build what could be values if you were not storing NaNs. Makes little
% sense to build it this way (by creating the large, square array and
% reducing it), but maybe you can build this vector directly instead
% of the array.
valuesLin = values(~isnan(values)) ;
% - Build sub2linInd function.
cSum = [0, cumsum( 1 : (size(values,2)-1) )] ;
sub2linInd = @(row, col) row + cSum(col-1).' ;
% - Extract relevant values.
validId = sub2linInd( validRC(:,1), validRC(:,2) ) ;
extractLin = valuesLin(validId) ;
  7 Comments
Cedric
Cedric on 11 Jun 2014
Edited: Cedric on 11 Jun 2014
Yes, you are absolutely right, I thought that it was "with replacement" and it is not (sorry, I should have checked!).
How are you building the first large matrix (?), would it be difficult to build directly a vector of non-NaN values ( double or even uintx) and avoid building the large array?
If you must build an array, could you use single elements instead of double? The single type/class supports NaN.
If you need e.g. uint8 because you are short on RAM, do you really need the 256 possible values; couldn't you, as mentioned above, use e.g. 255 to code for NaN?
All solutions have advantages and bottlenecks: memory for the double/NaN approach, practicality and speed for the nchoosek/non-NaN approach (nchoosek won't work if you have too many rows matching conditions, even with a k as small as 2), etc. I guess that at this stage, we could proceed by elimination: how much RAM do you have available, and how many rows matching conditions could you have?
I'd say that your short term best bet (reasonable memory usage, precision, computation time, and development time) is likely to be Image Analyst's approach, with your modification, applied to a class single Values array.
Sam
Sam on 11 Jun 2014
I think I'll use the uint8 because of RAM (I might need to go to 30000x30000). I have 16 GB of RAM. With the check of making sure the columns are greater than the rows, I think I don't need the NaN on the lower triangular part (since I will ignore any values there anyway). That way I can also use all 0 to 255 values.
Thanks. I think you and Image Analyst have solved my question!

Sign in to comment.

More Answers (1)

Image Analyst
Image Analyst on 9 Jun 2014
Try this and see if it's what you want:
Values = [...
NaN 4.0 3.1 2.3
NaN NaN 1.2 5.2
NaN NaN NaN 7.1
NaN NaN NaN NaN]
Charac = [...
4 2 3
1 4 5
-3 5 1
10 6 2]
% Find rows where column 2 > 1 and column 3 < 4
rowsToExtract = Charac(:, 2) > 1 & Charac(:, 3) < 4
% Extract only those rows and no others.
extractedRows = Values(rowsToExtract,:)
% Get the values from that which are not NaN.
extractedValues = extractedRows(~isnan(extractedRows(:)))
  3 Comments
Image Analyst
Image Analyst on 10 Jun 2014
I don't see why my way would not work. Your suggestion will end up eliminating some columns also, and you didn't say anything about eliminating columns.
Sam
Sam on 10 Jun 2014
So the rows and columns are "equivalent" with the same characteristics. That is why in the example given, only 3.1, 2.3 and 7.1 would survive because the columns are subject to the same constraints as the rows.
So I think your solution works but I had just not explained the problem clearly enough. I will test tomorrow to make sure everything works.
Thanks!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!