Asked by Gonzalo
on 31 May 2011

I need to find all elements that fall between 2 values (L,U) in a matrix (A) with ** 2.8 million** rows. I'm currently doing this:

[ind,~] = find(A(:,1) >= L & A(:,1) < U);

Variables U and L are updated 41 times inside a loop. This single line of code is slowing down the program by 20 minutes! Any ideas how can I speed this up?

Answer by Matt Fig
on 31 May 2011

Accepted answer

Do you need the actual indices or the values? If you only need the values, then it would probably be faster to do:

A(A(:,1) >= L & A(:,1) < U)

Matt Fig
on 31 May 2011

Then you may be stuck unless there are other efficiencies you can make. One alternative that I can think of to get the indices would be to use a dummy variable. I am not sure if this would be faster or not. Make IDX before hand, if you are looping....

IDX = uint32(1:size(A,1));

ind = IDX(A(:,1) >= L & A(:,1) < U);

Answer by James Tursa
on 1 Jun 2011

You can try a mex approach. The following file does the exact calculation shown above. If you need to modify it for different columns etc let me know. To mex it, simply put the file someplace on your path, make that directory your current directory, then type

mex findrange.c

The mex program does the calculations fast at the expense of memory. It always allocates enough to hold the indexes of the entire column and then just sets the return size to the amount that it found without reallocating & copying. But from the looks of things this may be a relatively minor temporary memory waste.

/* File: findrange.c */ /* IND = findrange(A,L,U) returns the same result as the following: */ /* IND = find(A(:,1)>=L & A(:,1)<U) */ /* Programmer: James Tursa */ #include "mex.h" void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { mwSize i, k, m; double L, U; double *pr, *ind;

if( nlhs > 1 ) { mexErrMsgTxt("Too many outputs."); } if( nrhs != 3 ) { mexErrMsgTxt("Need exactly 3 inputs."); } if( !mxIsDouble(prhs[0]) || mxIsSparse(prhs[0]) || mxGetNumberOfDimensions(prhs[0]) > 2 ) { mexErrMsgTxt("First argument must be full double 2D matrix."); } if( !mxIsNumeric(prhs[1]) || mxGetNumberOfElements(prhs[1]) != 1 ) { mexErrMsgTxt("2nd argument must be a scalar."); } if( !mxIsNumeric(prhs[2]) || mxGetNumberOfElements(prhs[2]) != 1 ) { mexErrMsgTxt("3rd argument must be a scalar."); } L = mxGetScalar(prhs[1]); U = mxGetScalar(prhs[2]); pr = mxGetPr(prhs[0]); m = mxGetM(prhs[0]); plhs[0] = mxCreateDoubleMatrix(m,1,mxREAL); ind = mxGetPr(plhs[0]); for( i=0; i<m; i++ ) { if( *pr >= L && *pr < U ) { *ind++ = i+1; } pr++; } mxSetM(plhs[0],ind-mxGetPr(plhs[0])); }

Jan Simon
on 1 Jun 2011

Instead of the DOUBLE output, an UINT32 vector would use the half memory:

uint32_T *ind, i, m;

plhs[0] = mxCreateNumericMatrix(m,1,mxUINT32_CLASS, mxREAL);

ind = (uint32_T *) mxGetData(plhs[0]);

With "for (i = 1; i <= m; i++), ... *ind++=i;" you can save some milliseconds.

James Tursa
on 2 Jun 2011

DOUBLE vs UINT32 memory savings:

True. Whether this is an actual savings overall depends on what is done with the result downstream (which OP doesn't show). If operations are done with it that turn it into a double then the memory savings will turn into a memory waste instead.

Changing the loop index to start from 1 instead of 0:

Yeah, I saw that right after I posted it (I habitually start my loops from 0 in C), but then I thought ... "Why don't I just let Jan find it ..."

Answer by Walter Roberson
on 31 May 2011

How "dense" are the values? It might be faster to use

[ind1, ~] = find(A(:,1)>=L); ind = ind1(A(ind1,:)<U);

Or reversing the order of the test, putting the test *less* likely to succeed first.

This could be cost-effective if relatively few matches are found, reducing the number of ind1 subscripts that need to be looked up in the second step.

Answer by Angus
on 22 Jun 2011

The fastest way I can think of is the following:

data = randn(3000000,3); inds = not(abs(sign(sign(L - data) + sign(U - data))));

this will give you a matrix of 1s and 0s indicating the indices where the values are between your two bounds (L, U); note not [L, U].

This should be lightening fast, for 3 columns and 3 million rows it computes in a fraction of a second.

Angus
on 22 Jun 2011

Opportunities for recent engineering grads.

## 2 Comments

## Angus (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/8556#comment_21606

The fastest way I can think of is the following:

data = randn(3000000,3);

inds = not(abs(sign(sign(L - data) + sign(U - data))));

this will give you a matrix of 1s and 0s indicating the indices where the values are between your two bounds (L, U); note not [L, U].

This should be lightening fast, for 3 columns and 3 million rows it computes in a fraction of a second.

## chirag (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/8556#comment_149592

just type this