Finding partial overlaps between sets

Question

Thomas Arildsen on 10 Jan 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/58421-finding-partial-overlaps-between-sets

Let us say I have some equal-sized sets of indices. I want to determine how much these sets overlap in the sense that I want to know how many of the sets contain a given index. I don't know what this is called, but I think of it as a kind of "soft intersection". I imagine doing it in one of the following ways, but I would like suggestions as to whether this is reasonable and possible better ways.

The indices in the sets can be in (1:N). I have K sets. I could create K length-N indicator arrays and add these K arrays together. The resulting length-N array would contain occurrence counts of the indices of their positions in the array.
Alternatively, I could create the union of the K sets. Then I could use find() to find occurrences of each of the members of the union set in a concatenation of the K index sets. The sizes of the results of find() would be occurrence counts of the indices in the sets.

1 Comment
Show -1 older commentsHide -1 older comments

Thomas Arildsen on 10 Jan 2013

I should add that the indices in each of the K sets are unique in that set, so the sets do not contain any duplicates.

Sign in to comment.

Sign in to answer this question.

Answer 1

Thomas Arildsen on 10 Jan 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/58421-finding-partial-overlaps-between-sets#answer_70694

Open in MATLAB Online

OK, that didn't take very long to try... I implemented no. 1 as follows:

function [ idxUnion, idxCounts ] = overlap1( idxSets )
%OVERLAP1 quantifies overlap between sets.
%   The function counts in how many of the input sets each index occurs.
%   The sets are specified as columns of the input matrix.
K = size(idxSets,2);
N = max(idxSets(:));
indicator = zeros(K,N);
for k = 1:K
  indicator(k,idxSets(:,k)) = 1;
end
counts = sum(indicator,1);
idxUnion = find(counts ~= 0);
idxCounts = counts(idxUnion);
end

I implemented no. 2 as follows:

function [ idxUnion, idxCounts ] = overlap2( idxSets )
%OVERLAP2 quantifies overlap between sets.
%   The function counts in how many of the input sets each index occurs.
%   The sets are specified as columns of the input matrix.
K = size(idxSets,2);
N = max(idxSets(:));
idxUnion = unique(idxSets)';
idxCounts = zeros(size(idxUnion));
for ii = 1:length(idxCounts)
  idxCounts(ii) = length(find(idxUnion(ii) == idxSets));
end
end

I test them as follows:

function [ result ] = overlap_test()
%OVERLAY_TEST Tests the overlay1 and overlay2 functions for errors.
%   This function runs overlay1 and overlay2 with a random test input and
%   performs a  couple of sanity checks on the output to validate these
%   functions.
K = 100;
M = 100;
N = 1e4;
testSets = zeros(M,K);
for k = 1:K
  testSets(:,k) = randperm(N,M)';
end
% Test overlap1 union
tic
[idxUnion1,idxCnts1] = overlap1(testSets);
t1 = toc;
disp(sprintf('overlap1 took %f s\n',t1))
result(1,1) = isempty(setdiff(idxUnion1,unique(testSets)));
% Test overlap2 union
tic
[idxUnion2,idxCnts2] = overlap2(testSets);
t2 = toc;
disp(sprintf('overlap2 took %f s\n',t2))
result(2,1) = isempty(setdiff(idxUnion2,unique(testSets)));
% Test agreement on counts
try
  result(3,1) = all(idxCnts1 == idxCnts2);
catch anError
  warning(anError.identifier,anError.message);
  result(3,1) = 0;
end
end

No. 1 seems clearly faster.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Finding partial overlaps between sets

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Finding partial overlaps between sets

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments