Applying large logical masks to large arrays quickly, row wise
Show older comments
I am filtering many large arrays NxM (n>500000, m>10) by applying a NxM mask to generate a many coloumn vectors. It is only possible for the mask to be true in a single column per row so the resultant column vectors will be nx1.
Directly applying the mask to the array is not an option as MATLAB applies the mask in a columnwise fashion. This is undesireable as the rows in the data represent a time stamp.
Is there a quicker way to apply a mask in a row wise fasion or convert a mask to linear indicies.
Example:
%% SETUP
% Data to be filtered
data = ...
[ 17 24 1 8 15; ...
23 5 7 14 16; ...
4 6 13 20 22; ...
10 12 19 21 3; ...
11 18 25 2 9 ];
% Mask to be applied
mask = ...
[ 0 0 1 0 0 ; ...
0 0 1 0 0 ; ...
0 1 0 0 0 ; ...
0 1 0 0 0 ; ...
1 0 0 0 0 ...
];
mask = logical(mask);
%% DESIRED DATA
wanted = [ 1;7;6;12;11]
%% DIRECT APPLICATION OF MASK
tic
direct = data(mask); % [11;6;12;1;7]
toc
disp(direct)
%% CURRENT SLOW SOLOUTION
tic
[row,col] = find(mask); % This line is slow when called many times with large arrays
[row,sortIdx] = sort(row);
col = col(sortIdx);
idx = sub2ind(size(mask), row, col);
current = data(idx);
toc
disp(current)
4 Comments
Dyuman Joshi
on 15 Dec 2022
Edited: Dyuman Joshi
on 15 Dec 2022
You can flip the data and mask to get the values in the order you want -
% Data to be filtered
data = ...
[ 17 24 1 8 15; ...
23 5 7 14 16; ...
4 6 13 20 22; ...
10 12 19 21 3; ...
11 18 25 2 9 ];
% Mask to be applied
mask = ...
[ 0 0 1 0 0 ; ...
0 0 1 0 0 ; ...
0 1 0 0 0 ; ...
0 1 0 0 0 ; ...
1 0 0 0 0 ...
];
mask = logical(mask);
%% DESIRED DATA
%% DESIRED DATA
wanted = [ 1;7;6;12;11];
disp('current solution')
timeit(@()currentSol(data,mask))
isequal(wanted,currentSol(data,mask))
disp('using flip')
timeit(@()flipSol(data,mask))
isequal(wanted,flipSol(data,mask))
function current = currentSol(data,mask)
[row,col] = find(mask); % This line is slow when called many times with large arrays
[row,sortIdx] = sort(row);
col = col(sortIdx);
idx = sub2ind(size(mask), row, col);
current = data(idx);
end
function solf = flipSol(data,mask)
D=flip(data);
solf=flip(D(flip(mask)));
end
Flipping the data and mask horizontally does not solve the problem in general.
data = [ 17 24 1 8 15; ...
23 5 7 14 16; ...
4 6 13 20 22; ...
10 12 19 21 3; ...
11 18 25 2 9 ];
mask = ...
[ 0 0 0 0 1 ; ...
0 0 0 1 0 ; ...
1 0 0 0 0 ; ...
0 0 1 0 0 ; ...
0 1 0 0 0] == 1;
y = flipSol(data, mask).'
wanted = [15, 14, 4, 19, 18];
function solf = flipSol(data,mask)
D=flip(data);
solf=flip(D(flip(mask)));
end
Dyuman Joshi
on 16 Dec 2022
I see, it's top to down rather than right to left.
I have a query regarding your comment below, if you don't mind answering @Jan
"The creation of large temporary arrays is expensive."
How to know if something is memory or time (I don't if they are different or not) expensive?
Jan
on 17 Dec 2022
You can simply count, how much memory a code costs. Split the executing into atomic parts, e.g.:
Z = exp(sin(X + Y).^2); % where X and Y are matrices
% In steps:
A = X + Y;
B = sin(X);
C = B.^2;
Z = exp(C);
Three temporary arrays are required. How expensive this is, depends on the size of the arrays and the CPU. If the array matchs into the CPU caches, there is no latency for waiting for the slow RAM.
How expensive the calculations are, depends in the CPU also and even on the Matlab version. SIN, EXP and the elementwise power can be multi-threaded. Then Matlab distributes the processing to several cores. It depends on the implementation how many cores are used for a specific data size.
There is no general way to predict, how much time is needed for the processing and the memory access. You have to try it. Look at my answers for your question. I've tried 10 different algorithms, but posted the most promissing versions only.
When I write a function, I implement different methods and store them in a unit-test function. On one hand I use them to test the output. On the other hand I can check for new Matlab versions, if the formerly fastest implementation is still the best one.
Don't forget: The time for solving a problem is:
T_all = T_design + T_program + T_debug + T_documentation + T_run
Avoid a pre-mature optimization of the run time. Write stable code at first and then focus on on the bottleneck determined by the profiler.
Accepted Answer
More Answers (1)
A C-mex version:
// File: maskTCopy.c
// Compile: mex -O -R2018a maskTCopy.c
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *data, *out;
mxLogical *mask;
mwSize m, n, im, in;
data = mxGetDoubles(prhs[0]);
m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
mask = mxGetLogicals(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(m, 1, mxREAL);
out = mxGetDoubles(plhs[0]);
for (in = 0; in < n; in++) {
for (im = 0; im < m; im++) {
out[im] += *mask++ * *data++;
}
}
}
Under my Matlab R2018b this needs 0.39 sec instead of 1.83 sec of matchSol() from my other answer and 2.65 sec of the original version. [EDITED, new version of C code, 045 -> 0.39 sec]
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!