Search nx3 array row-wise for 2 equal elements

1 view (last 30 days)
I have a nx3 sized array. I need to find all rows that have 2 common elements. I can do this using logical operators and comparing each element in turn, but this seems very inefficient.
Is there a better solution?
For example
In the following array, I need to identify that rows 1 and 3 contain 2 matching elements (as do rows 2 and 4).
[5 6 2,
3 2 7,
4 6 5,
9 7 2,
5 3 8]
  1 Comment
Andreas Goser
Andreas Goser on 23 Feb 2011
Is have no full answer for that, but I wonder if UNIQUE could be used for that?

Sign in to comment.

Accepted Answer

David Young
David Young on 23 Feb 2011
My initial idea is below, but someone may be able to improve on it in efficiency or elegance.
There is an awkwardness in this solution: if row A has two elements in common with both rows B and C, then the results may contain both [A B C] and [B C]. There are various things you could do about this - but a more precise specification of the problem is needed in order to choose how to handle such cases. If the data only ever has pairs of matching rows, there's no problem.
% data
a = [5 6 2;
3 2 7;
4 6 5;
9 7 2;
5 3 8]
% code
ap = a.';
groups = cell(1, size(ap,2)-1); % pre-allocate cell array for results
ngroups = 0;
for k = 1:size(ap, 2)-1 % loop over rows of a (cols of ap)
same = bsxfun(@ismember, ap(:,k), ap(:,k+1:end)); % find common elements
matches = find(sum(same) >= 2); % find cols of ap with at least 2 such
if ~isempty(matches) % at least one col has 2 matches
ngroups = ngroups+1; % so add it to the result
groups{ngroups} = [k k+matches];
end
end
groups = groups(1:ngroups); % trim results to the part that was used

More Answers (4)

Jos (10584)
Jos (10584) on 23 Feb 2011
Here is another idea. First create an upper triangular 3D logical matrix (OLP) in which the elements OLP(k,j,i) (with j>k) represents the presence (true) or absence of the i-th element of the k-th row of the input matrix X in the j-th row of X. This matrix is upper triangular, since we are not (yet?) interested in the main diagonal (all true) or its inherent symmetry. Based on this logical matrix return the pairs for which the sum along the planes equals 2.
clear X Res
X = [5 6 2,
3 2 7,
4 6 5,
9 7 2,
5 3 8]
Ntoshare = 2 ;
nrows = size(X,1) ;
OLP = repmat(false(nrows), [1 1 size(X,2)]) ;
for k=1:nrows-1,
for j=k+2:nrows,
OLP(k,j,:) = ismember( X(k,:), X(j,:)) ; % ismembc?
% OLP(j,k,:) = OLP(k,j,:) ; % symmetry ?!
end
end
[Res(:,1) Res(:,2)] = find(sum(OLP,3)==Ntoshare) ;
disp(Res)
"Res" holds the pairs of rows that share, in this case, 2 elements

Matt Fig
Matt Fig on 23 Feb 2011
Another approach which may get you some speed if your real X array is larger than the one you show.
Xs = sort(X,2);
S = size(X,1);
L = S*(S+1)/2;
F = sparse(L,L);
cnt = 0;
Ntoshare = 2;
for ii = 1:S
for jj = ii+1:S
cnt = cnt + 1;
if sum(ismembc(Xs(ii,:),Xs(jj,:)))==Ntoshare
F(jj,ii) = 1;
end
end
end
[I,J] = find(F);
% Now look:
[J,I]
  1 Comment
Leonard
Leonard on 9 Apr 2015
I am using this idea here to find rows which have two common elements. I have to do this for 350 files each of which are 7000x3. On my mac air thats like 30 hrs at least. Apparently, i should code this in C or run on a cloud server like aws. Ec2 doesnt support 2012a student though. Any suggestions on how to complete this process?

Sign in to comment.


Bruno Luong
Bruno Luong on 23 Feb 2011
I could not remove the two for-loops
% Data
A = ceil(50*rand(1000,3));
% Engine
tic
[m n] = size(A);
groups = cell(1,m);
ng = 0;
for k=1:m-1
u = unique(A(k,:)); % representation of kth row
[in J] = ismember(A(k:end,:),u);
l = m-k+1;
r = repmat((1:l).', n, 1);
c = accumarray([r(in) J(in)],1,[l n]); % count
c = bsxfun(@min,c,c(1,:)); % clip
rows = sum(c,2)==2; % check if 2 elements are common
if any(rows)
ng = ng+1;
groups{ng} = (k-1) + [1; find(rows)];
end
end
% Remove the tail
groups(ng+1:end) = [];
toc
% Display
for k=1:length(groups)
fprintf('\n');
gk = groups{k};
for r = gk'
fprintf('row #%d %s\n', r, mat2str(A(r,:)));
end
end

Bruno Luong
Bruno Luong on 24 Feb 2011
Here is a version that select groups of rows when at least two elements are common.
% Data
A = ceil(100*rand(1000,3));
% Engine
[m n] = size(A);
groups = cell(1,m);
ng = 0;
for k=1:m-1
u = unique(A(k,:)); % representation of kth row
[in J] = ismember(A(k:end,:),u);
l = m-k+1;
r = repmat((1:l).', n, 1);
c = accumarray([r(in) J(in)],1,[l n]); % count
c = bsxfun(@min,c,c(1,:)); % clip
rows = sum(c,2)>=2; % check if at least 2 elements are common
rows(1) = false;
if any(rows)
ng = ng+1;
rows(1) = true;
groups{ng} = (k-1) + find(rows);
end
end
% Remove the tail
groups(ng+1:end) = [];
% Display
for k=1:length(groups)
fprintf('\n');
gk = groups{k};
for r = gk'
fprintf('row #%d %s\n', r, mat2str(A(r,:)));
end
end

Categories

Find more on Graphics Object Programming in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!