Matlab comparison of two large matricies
3 views (last 30 days)
Show older comments
I am trying to retrieve the index of exact matches (row-specific) between two large matricies. I have a n x 61 matrix A containing values from 0 to 9 and another n x 61 matrix B , whereas each row here contains values from 0 to 9 but mostly NaN (only 2 to 8 columns in each row of matrix B contain actual numbers). Matrix A can be expected to have between 1.5 million and 3 million rows, whereas matrix B has around 0.2 to 0.5 million rows. Here is an example of the setup:
% create matrix a with random data
dataSample = [0 9];
numRows = 1000000;
numCols = 61;
A = randi(dataSample,numRows,numCols);
% create matrix B with random data
numRows = 100000;
numCols = 61;
numColsUse = 2:8;
dataRange = 0:9;
B = NaN(numRows,numCols);
for i = 1:size(B,1)
% randomly selet number of columns to fill
numColsFill = datasample(numColsUse,1);
% randomly select column index from available columns
colIdx = datasample([1:numCols],numColsFill);
% randomly select values from 0 to 9
numFill = datasample([0:9],numColsFill);
% insert numbers at respective column in matrix B
B(i,colIdx) = numFill;
end
I want to compare every single row of matrix A with the entire matrix B and find exact matches, where the numbers of matrix B match the numbers of matrix A at their respective positions (columns) - hence the NaN in matrix B are to be ignored.
I can achieve the desired result using cellfun, where I slice matrix A in several subsets and then use a custom function to compare the rows of the subset with each row in matrix B, like so:
% put all rows of matrix B in single cell
cellB = {B};
% take subset of matrix A and convert to cell array
subA = A(1000:5000,:);
subA = num2cell(subA,2);
% prepare cellB to meet cellfun conditions
cellB = repmat(cellB, [size(subA,1) 1]);
% apply cellfun to retrieve index of each exact match
idxContainer = cellfun(@findMatch, cellB, subA, 'UniformOutput', false);
Function findMatch looks as follows:
function [ idx ] = LTableEval( cellB, subA )
idxCheckLT = lt(cellB, repmat(subA, [size(cellB,1) 1]));
idxCheckGT = gt(cellB, repmat(subA, [size(cellB,1) 1]));
idxCheck = idxCheckLT + idxCheckGT;
idxSum = sum(idxCheck,2);
idx = find(idxSum == 0);
end
This approach works, but it appears to be very inefficient, especially RAM-wise, as the cellfun requires all inputs to have the same size and hence a multiplication of the same data sets. Any ideas on how to tackle this problem in a more efficient way? Many thanks!
0 Comments
Accepted Answer
Guillaume
on 28 May 2017
This is how I'd do it:
matches = cell(size(B, 1), 1);
for Brow = 1:size(B, 1)
Bcols = find(~isnan(B(Brow, :)));
matches{Brow} = find(all(A(:, Bcols) == B(Brow, Bcols), 2)); %requires R2016b or later
end
It's certainly a lot more efficient than any of the solutions you already have.
Note: in R2015b or earlier replace the relevant line by:
matches{Brow} = find(all(bsxfun(@eq, A(:, Bcols), B(Brow, Bcols)), 2));
4 Comments
Guillaume
on 29 May 2017
Try this:
matches = cell(size(B, 1), 1);
for Brow = 1:size(B, 1)
Bcols = find(~isnan(B(Brow, :)));
matchedrows = find(all(A(:, Bcols) == B(Brow, Bcols), 2));
matches{Brow} = [matchedrows, repmat(Brow, size(matchedrows))];
end
matches = cell2mat(matches);
finalScores = accumarray(matches(:, 1), matches(:, 2), [size(A, 1), 1], @(Brows) mean(Bscores(Brows)), nan);
However, if most rows of A all match at least some rows of B it may be more efficient to have just one loop over the rows of A. (I regard accumarray as another loop)
More Answers (1)
Matthew Eicholtz
on 26 May 2017
A couple comments:
1. Did you mean to convert to the cell array in this manner?
subA = num2cell(subA);
If you want to look at each row as its own cell, I think you need:
subA = num2cell(subA,2);
2. I am not sure how much more efficient this solution will be, but you can replace
idxContainer = cellfun(@findMatch, cellB, subA, 'UniformOutput', false);
with
idxContainer = cellfun(@(x,y) find(all(isnan(x)|x==y,2)), cellB, subA, 'UniformOutput', false);
Let me know if this helps at all.
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!