Asked by yosey
on 25 May 2013

I have a matrix (S) and one row out of that matrix (s). I want to get the rownumber of (s) in (S).

(S) has unique rows. The row (s) is allways member of (S)-rows.

So far I calculate the row number with:

I = find(all(bsxfun(@eq,S,s),2));

I am looking for a faster method. I call this line alot, it costs me hours of calculation time.

Here's a test example:

% create example, N and n could change n = 4; % n < 10 N = 25; % range(N) ~ [20 100] S = nchoosek(1:N,n);

for i = 1:10 s = S(randi(size(S,1)),:);

tic I = find(all(bsxfun(@eq,S,s),2)); toc end

**[Edit2]**, poorly expressed "(S) has unique rows". It should express that ** I** is allways scalar.

**[Edit]**, dimensional details.

My example shows the general direction of the sizes.

[d1 d2] = size(S); % its [12650x4] in my example

First dimension will be much larger then the second one. (d1 >> d2)

Dimension two (d2=n) will be smaller then 10. Numel(S) will be limited by Memory.

*No products are associated with this question.*

Answer by Jan Simon
on 25 May 2013

Edited by Jan Simon
on 25 May 2013

Accepted answer

If the first dimension of S is large, the BSXFUN approach checks of a lot of numbers although they are rejected in the former columns already. A small Matlab function can avoid testing already rejected rows:

function Index = FindRow(S, s) nCol = size(S, 2); match = S(:, 1) == s(1); for col = 2:nCol match(match) = S(match, col) == s(col); end Index = find(match);

For Image Analyst's example under Matlab 2009a/64/Win7:

S = randi(3, 200, 3); s = [2 3 1];

tic, for k=1:1000; m = find(all(bsxfun(@eq,S,s),2)); end; toc tic; for k=1:1000; m = find(ismember(S, s, 'rows')); end; toc tic; for k=1:1000; m = FindRow(S, s); end; toc

% 0.032688 sec % 0.433527 sec % 0.021643 sec

But for `S = randi(30, 2000, 20), s=S(1000, :)`:

% 0.124566 sec % 3.175940 sec % 0.194101 sec

**[EDITED]** It is easier in C to stop the search after the first matching row is found. This saves 50% processing time in the average. In addition no temporary arrays are required:

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { double *S, *s; mwSize n1, n2, i1, i2, k;

S = mxGetPr(prhs[0]); n1 = mxGetM(prhs[0]); n2 = mxGetN(prhs[0]); s = mxGetPr(prhs[1]);

for (i1 = 0; i1 < n1; i1++) { k = i1; for (i2 = 0; i2 < n2; i2++) { if (S[k] != s[i2]) { break; } k += n1; }

if (i2 == n2) { // Matching row found: plhs[0] = mxCreateDoubleScalar((double) (i1 + 1)); return; } }

// No success: plhs[0] = mxCreateDoubleScalar(mxGetNaN()); }

For `S = randi(30, 2000, 20)` I get (compiled by MSVC2008):

BSXFUN: 0.123305 sec FindRow.m: 0.194914 sec ISMEMBER: 3.172425 sec FindRow.mex: 0.011098 sec (s is last row) 0.007880 sec (s is 1000th row) 0.004753 sec (s is first row)

The MEX has an average speedup of factor 15. For a [20000 x 200] matrix we get a factor of 260 already.

Jan Simon
on 27 Dec 2013

Dear Jacob L,

The request you've sent me by email would have been a perfect question for this forum. Therefore I answer here, how to modify this function to work for UINT8 arrays:

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { uint8_T *M, *v; mwSize m, n, i1, i2, k;

if (nrhs != 2 || !mxIsUint8(prhs[0]) || !mxIsUint8(prhs[1])) { mexErrMsgIdAndTxt("MEX:FindRow:BadInput", "2 UINT8 arrays required as input."); }

M = (uint8_T *) mxGetData(prhs[0]); m = mxGetM(prhs[0]); n = mxGetN(prhs[0]); v = (uint8_T *) mxGetData(prhs[1]);

if (mxGetNumberOfElements(prhs[1]) != n) { mexErrMsgIdAndTxt("MEX:FindRow:BadSize", "Sizes of inputs do not match."); }

for (i1 = 0; i1 < m; i1++) { k = i1; for (i2 = 0; i2 < n; i2++) { if (M[k] != v[i2]) { break; } k += m; }

if (i2 == n) { // Matching row found: plhs[0] = mxCreateDoubleScalar((double) (i1 + 1)); return; } }

// No success: plhs[0] = mxCreateDoubleMatrix(0, 0, mxREAL); }

Mark
on 13 Mar 2015

Hi,

I used your your original mex file in matlab and tried to change it. I'm also searching for fast ways to find a row matrix s in a large matrix S. However the row matrix s appears a number of times in large matrix S.

if true #include "mex.h" void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[], mwSize, lastfound) { double *S, *s *lastfound; mwSize n1, n2, i1, i2, k; S = mxGetPr(prhs[0]); n1 = mxGetM(prhs[0]); // n2 = mxGetN(prhs[0]); n2 = 3; s = mxGetPr(prhs[1]); for (i1 = lastfound; i1 < n1; i1++) { k = i1; for (i2 = 0; i2 < n2; i2++) { if (S[k] != s[i2]) { break; } k += n1; } if (i2 == n2) { // Matching row found: plhs[0] = mxCreateDoubleScalar((double) (i1 + 1)); return; } } // No success: plhs[0] = mxCreateDoubleScalar(mxGetNaN()); } end

So i tried to use another input 'lastfound' which would enable the code to start at lastfound + 1 (the +1 is added when the function is called), so that the previous row s is found, and the code start searching the second row s. I do this until the mex file return NaN. It however doesn't work, the code doesn't start the at the lastfound + 1 step. I am a C noob, so i think it is a very small error, but i would immensely appreciate it if somebody could take a look at it.

Thanks in advance :)

Answer by Image Analyst
on 25 May 2013

Try this:

% Generate some sample data: m = randi(3, 200, 3) % Specify what we're looking for: lookingFor = [2 3 1]; % Let's say we're looking for this % Find out what rows that pattern occurs in: rowsItOccursIn = find(ismember(m, lookingFor, 'rows'))

Show 2 older comments

Image Analyst
on 25 May 2013

**actual** sizes of n, N, and S? For 4 and 25, it DEFINITELY should not take "hours of calculation time" like you said.

Answer by Teja Muppirala
on 26 May 2013

Not sure if this is faster, but might be worth a try.

d1 = 12650; d2 = 4; S = randn(d1,d2); s = S(randi(d1),:); c = 1; r = 1;

while c <= d2 if S(r,c) == s(c) c = c+1; else r = r+1; c = 1; end end isequal(S(r,:),s)

If the big S is the same every time, you could make a better algorithm by sorting it in advance.

yosey
on 26 May 2013

The while loop is pretty fast, but cant beat the mex file. Here are the times for 1000 loops.

Elapsed time is 0.143702 seconds. % find(all(bsxfun(@eq,S,s),2)) Elapsed time is 0.059386 seconds. % findrow_mex Elapsed time is 0.118347 seconds. % your while loop

Since I've got some trouble reading the C-code, isn't the mex-code the same as your while loop?

Opportunities for recent engineering grads.

## 1 Comment

## Jan Simon (view profile)

Direct link to this comment:http://www.mathworks.com/matlabcentral/answers/76972#comment_150979

What is the usual dimension of S? It matters if you are talking about 10x4, 10'000x20 or 20x10'000 matrices.