Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
vectorize intersect with cells?

Subject: vectorize intersect with cells?

From: HenryW

Date: 9 Mar, 2009 01:04:42

Message: 1 of 13

I have a cell cliques=cell(n,1) where each cell
cliques{i} is a set of integers e.g. cliques{1}=[1 3 9 29]
I would like to vectorize this for loop ---- which takes very long:

for ic=1:n-1,
  for jc=ic+1:n,
    intcliques(ic,jc)=length(intersect(cliques{ic},cliques{jc}));
  end
end

any suggestions?

Subject: vectorize intersect with cells?

From: Bruno Luong

Date: 9 Mar, 2009 07:56:01

Message: 2 of 13

HenryW <hwolkowicz@uwaterloo.ca> wrote in message <18230588.1236560716722.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a cell cliques=cell(n,1) where each cell
> cliques{i} is a set of integers e.g. cliques{1}=[1 3 9 29]
> I would like to vectorize this for loop ---- which takes very long:
>
> for ic=1:n-1,
> for jc=ic+1:n,
> intcliques(ic,jc)=length(intersect(cliques{ic},cliques{jc}));
> end
> end
>
> any suggestions?

Could you tell us the typical values of n and length(cliques{i})?

Thanks,

Bruno

Subject: vectorize intersect with cells?

From: us

Date: 9 Mar, 2009 09:28:30

Message: 3 of 13

HenryW wrote:
> I would like to vectorize this for loop =A0---- which takes very long:
> any suggestions...

this loop cannot be vectorized (in the true sense of vectorization)...
it only can be optimized...

two of the possible solutions including simple profiling
- copy/paste

% the data
     clear c* nc* r*; % <- save old stuff!
     nt=3D10; % <- #runs/solution
     nc=3D100; % <- max #data/cell
     cc=3Dcell(nc,1);
for i=3D1:nc
     nx=3Dceil(10*rand(1,nc));
     ne=3Dceil(nc*rand);
     cc{i,1}=3Dnx(1:ne);
end

% the engine
     r1=3Dzeros(nc,nc);
     r2=3Dzeros(nc,nc);
% - sol #1 =3D OP
tic;
for i=3D1:nt
for ic=3D1:nc-1,
for jc=3Dic+1:nc,
     r1(ic,jc)=3Dlength(intersect(cc{ic},cc{jc}));
end
end
end
toc
% - sol #2
tic;
for i=3D1:nt
for ic=3D1:nc-1,
for jc=3Dic+1:nc
     r2(ic,jc)=3Dsum(ismembc(unique(cc{ic}),sort(cc{jc})));
end
end
end
toc
% - sol #3
tic;
for i=3D1:nt
     ix=3Dnchoosek(1:nc,2);
     r3=3Dcellfun(@(x,y) sum(ismembc(unique(x),sort(y))),cc(ix(:,1)),cc
(ix(:,2)));
     r3=3Daccumarray(ix,r3,[nc,nc]);
end
toc
     disp(isequal(r1,r2,r3));
%{
% wintel system
% ic2/2*2.6ghz/2gb/winxp.sp3/r2008b
Elapsed time is 5.368212 seconds. % <- sol #1 =3D OP
Elapsed time is 2.040264 seconds. % <- sol #2 =3D ~50%
Elapsed time is 2.102942 seconds. % <- sol #3 =3D ~50%
isequal =3D 1
%}

us

Subject: vectorize intersect with cells?

From: us

Date: 9 Mar, 2009 09:33:50

Message: 4 of 13

HenryW wrote:
> I would like to vectorize this for loop ---- which takes very long:
> any suggestions...

this loop cannot be vectorized (in the true sense of
vectorization)...
it only can be optimized...

two of the possible solutions including simple profiling
- copy/paste
- second post via google...

% the data
     clear c* nc* r*; % <- save old stuff!
     nt=10; % <- #runs/solution
     nc=100; % <- max #data/cell
     cc=cell(nc,1);
for i=1:nc
     nx=ceil(10*rand(1,nc));
     ne=ceil(nc*rand);
     cc{i,1}=nx(1:ne);
end

% the engine
     r1=zeros(nc,nc);
     r2=zeros(nc,nc);
% - sol #1 = OP
tic;
for i=1:nt
for ic=1:nc-1,
for jc=ic+1:nc,
     r1(ic,jc)=length(intersect(cc{ic},cc{jc}));
end
end
end
toc
% - sol #2
tic;
for i=1:nt
for ic=1:nc-1,
for jc=ic+1:nc
     r2(ic,jc)=sum(ismembc(unique(cc{ic}),sort(cc{jc})));
end
end
end
toc
% - sol #3
tic;
for i=1:nt
     ix=nchoosek(1:nc,2);
     r3=cellfun(@(x,y) sum(ismembc(unique(x),sort(y))),...
                cc(ix(:,1)),cc(ix(:,2)));
     r3=accumarray(ix,r3,[nc,nc]);
end
toc
     disp(isequal(r1,r2,r3));
%{
% wintel system
% ic2/2*2.6ghz/2gb/winxp.sp3/r2008b
Elapsed time is 5.368212 seconds. % <- sol #1 = OP
Elapsed time is 2.240264 seconds. % <- sol #2
Elapsed time is 2.302942 seconds. % <- sol #3
isequal = 1
%}

us

Subject: vectorize intersect with cells?

From: Bruno Luong

Date: 9 Mar, 2009 10:33:30

Message: 5 of 13

Here is one way, complicated but should be much faster at the run
time:

function testintersect

n=200; % number of sets
m=100; % element value are from 0 to m-1
meannumel=50; % average number of elements in lists
s = 20; % standard deviation of number of elements
cliques = gendata(n, m, meannumel, s);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%
% Engine 1 (Henry)
tic
intcliques1=Engine1(cliques);
toc % 2.374056 seconds.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%
% Engine 3 (Bruno)
tic
intcliques3=Engine3(cliques);
toc % 0.059586 seconds.

% Check
isequal(intcliques1,intcliques3)
subplot(1,2,1);
imagesc(intcliques1);
subplot(1,2,2);
imagesc(intcliques3);
end


% Generate data
function cliques = gendata(n, m, meannumel, s)
% Data

cliques=cell(1,n);
for k=1:n
    nelk = meannumel+round(s*randn);
    nelk = min(max(nelk,1),m);
    cliques{k} = floor(m*rand(1,nelk));
end
end

function intcliques=Engine1(cliques)
n=length(cliques);
intcliques=zeros(n,n);
for ic=1:n-1,
    for jc=ic+1:n,
        intcliques(ic,jc)=length(intersect(cliques{ic},cliques{jc}));
    end
end
end

function intcliques=Engine3(cliques)
n=length(cliques);
% Transform to long column array
c = cellfun(@(x) unique(x(:)), cliques, 'uni', false);
allc = cat(1,c{:});
% Take a unique
[u I J]=unique(allc);
% map c to index integer set: 1,2,...
cmap = mat2cell(J,cellfun(@length,c),1);
% Set id, same size as cmap
setid = arrayfun(@(n) n+zeros(size(cmap{n})), 1:length(cmap), ...
    'uni', false);
% Reshape as long vectors
cmap = cat(1,cmap{:});
setid = cat(1,setid{:});

% For each element, return the list of set that contain it
contained=accumarray(cmap,setid,[],@(id) {sort(id)});

% Build the list of all couples
couples = cellfun(@allcouples, contained, 'uni', false);
couples = cat(1,couples{:});
% Count
%intcliques=sparse(couples(:,1),couples(:,2),1,n,n); % < sparse
intcliques=accumarray(couples,1,[n,n]);

    function res=allcouples(k) % Return all couples of a list

        k=k(:);
        [I J]=ndgrid(1:numel(k),1:numel(k));
        [I J]=deal(I(I<J),J(I<J));
        res=[k(I(:)) k(J(:))];
    end

end

Subject: vectorize intersect with cells?

From: Bruno Luong

Date: 9 Mar, 2009 15:26:02

Message: 6 of 13

% More compact code:

function testintersect

n=200; % number of sets
m=100; % element value are from 0 to m-1
meannumel=50; % average number of elements in lists
s = 20; % standard deviation of number of elements, 0 to m-1
cliques = gendata(n, m, meannumel, s);

% Engine 1 (Henry)
tic
intcliques1=Engine1(cliques);
toc % 2.373781 seconds.

% Engine 3 (Bruno)
tic
intcliques3=Engine3(cliques);
toc % 0.048709 seconds.

% Check
isequal(intcliques1,intcliques3)
subplot(1,2,1);
imagesc(intcliques1);
subplot(1,2,2);
imagesc(intcliques3);

end % testintersect

% Generate data
function cliques = gendata(n, m, meannumel, s)
% Data

cliques=cell(1,n); % preallocate
for k=1:n
    % number of elements in set #k
    nelk = meannumel+round(s*randn);
    nelk = min(max(nelk,0),m); % clip it
    % random set of length nelk
    cliques{k} = floor(m*rand(1,nelk));
end
end % gendata

function intcliques=Engine1(cliques)
n=length(cliques);
intcliques=zeros(n,n);
for ic=1:n-1,
    for jc=ic+1:n,
        intcliques(ic,jc)=length(intersect(cliques{ic},cliques{jc}));
    end
end
end % Engine 1

function intcliques=Engine3(cliques)
% Number of sets
n=length(cliques);
% Transform to long column array
c = cellfun(@(x) unique(x(:)), cliques, 'uni', false);
% Set id, same size as c
setid = arrayfun(@(id) id+zeros(size(c{id})), 1:n, ...
                 'uni', false);
% Reshape as long vectors
setid = cat(1,setid{:});
% Take a unique of all elements
% map c to index integer set: 1,2,...
[dummy dummy cmap]=unique(cat(1,c{:}));
clear dummy

% For each element, return the list of set that contain it
contained=accumarray(cmap(:),setid,[],@(id) {id});
clear cmap setid

% Build the list of all couples
couples = cellfun(@allcouples, contained, 'uni', false);
couples = cat(1,couples{:}); % Put in a long (p x 2)-array

% Count
%intcliques=sparse(couples(:,1),couples(:,2),1,n,n); % < sparse
intcliques=accumarray(couples,1,[n n]); % <- full

    function res=allcouples(k) % Return all couples of a list
        
        [K1 K2]=ndgrid(k(:),k(:));
        filter=K1<K2;
        res=[K1(filter) K2(filter)];
    end % allcouples

end % Engine3

% Bruno

Subject: vectorize intersect with cells?

From: Roger Stafford

Date: 9 Mar, 2009 16:11:01

Message: 7 of 13

HenryW <hwolkowicz@uwaterloo.ca> wrote in message <18230588.1236560716722.JavaMail.jakarta@nitrogen.mathforum.org>...
> I have a cell cliques=cell(n,1) where each cell
> cliques{i} is a set of integers e.g. cliques{1}=[1 3 9 29]
> I would like to vectorize this for loop ---- which takes very long:
>
> for ic=1:n-1,
> for jc=ic+1:n,
> intcliques(ic,jc)=length(intersect(cliques{ic},cliques{jc}));
> end
> end
>
> any suggestions?

  The reason your nested for-loops take so long is that there are n*(n-1)/2 calls on the 'intersect' function and each time this function must do some sorting to find shared elements. That's a lot of sorting for large n. The following code does just one massive sort on the assumption that that will save unnecessary repetition in the sorting process.

  Call your cell array 'cliques' just 'c' here and assume that each element of 'c' is a column vector and that there are n such vectors in the cell array.

 s = ones(n+1,1);
 for ic = 1:n, s(ic+1) = s(ic) + length(c{ic}); end
 m = s(n+1)-1;
 x = zeros(m,1);
 for ic = 1:n, x(s(ic):s(ic+1)-1) = c{ic}; end
 [x,p] = sort(x); q = (1:m)'; q(p) = q;
 u = cumsum([1;diff(x)~=0]);
 t = cumsum(accumarray(s,1));
 A = sparse(u(q),t(1:m),1,u(m),n);
 A = full(A'*A);

Then A is your desired n x n matrix (which you called 'intcliques'.)

  You will note that the diagonal and the lower triangular part are also filled. I hope that is all right with you.

  I used the 'sparse' function on the assumption that at this point A would be very sparsely filled. If that is not so, you can replace it with an equivalent call to 'accumarray' and drop the 'full' in the next line.

  I have also assumed that there are no repeated elements in each individual column vector of 'c'.

Roger Stafford

Subject: vectorize intersect with cells?

From: HenryW

Date: 9 Mar, 2009 17:03:56

Message: 8 of 13

The values of n are typically from 2000 upwards; often 10,000. But, we are hoping to solve some very large problems and this is a preliminary preprocessing step.

Subject: vectorize intersect with cells?

From: Bruno Luong

Date: 9 Mar, 2009 17:50:02

Message: 9 of 13

Great algo from Roger, very quick using binary sparse matrix to store set-element relationship. Here is the algo, slightly modified to give the same result as the original one, and vectorized two for-oops:

function A = Engine4(c) % Roger
n=length(c);
c = cellfun(@(x) unique(x(:)), c, 'uni', false);
s = cumsum([1 cellfun(@length,c)]).';
x=cat(1,c{:});
m = s(n+1)-1;
[x,p] = sort(x);
q = (1:m)';
q(p) = q;
u = cumsum([1;diff(x)~=0]);
t = cumsum(accumarray(s,1));
A = sparse(u(q),t(1:m),1,u(m),n);
A = triu(full(A'*A),1);
end % Engine 4

Subject: vectorize intersect with cells?

From: Bruno Luong

Date: 9 Mar, 2009 20:09:01

Message: 10 of 13

% Somehow shorter

function A = Engine5(c) % Roger & Bruno

c = cellfun(@(x) unique(x(:)), c, 'uni', false);
l = cellfun(@length,c(:));
setid = cumsum(accumarray(cumsum([1; l]),1));
[dummy dummy cmap]=unique(cat(1,c{:}));
A = sparse(setid(1:end-1),cmap,1,length(c),length(dummy));
A = triu(full(A*A.'),1);

end % Engine 5

% Bruno

Subject: vectorize intersect with cells?

From: Roger Stafford

Date: 9 Mar, 2009 20:36:01

Message: 11 of 13

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <gp3ksa$aak$1@fred.mathworks.com>...
> Great algo from Roger, very quick using binary sparse matrix to store set-element relationship. Here is the algo, slightly modified to give the same result as the original one, and vectorized two for-oops:
>
> function A = Engine4(c) % Roger
> n=length(c);
> c = cellfun(@(x) unique(x(:)), c, 'uni', false);
> s = cumsum([1 cellfun(@length,c)]).';
> x=cat(1,c{:});
> m = s(n+1)-1;
> [x,p] = sort(x);
> q = (1:m)';
> q(p) = q;
> u = cumsum([1;diff(x)~=0]);
> t = cumsum(accumarray(s,1));
> A = sparse(u(q),t(1:m),1,u(m),n);
> A = triu(full(A'*A),1);
> end % Engine 4

  Thanks, Bruno, for fixing up that code for generating x and s. I am a "babe in the woods" when it comes to cell arrays. Also thanks for the correction with 'triu'.

  As it turns out, neither that inverse permutation 'q' nor the quantity 'm' were actually needed, so here is how your modification would look without them:

 n=length(c);
 c = cellfun(@(x) unique(x(:)), c, 'uni', false);
 s = cumsum([1 cellfun(@length,c)]).';
 x=cat(1,c{:});
 [x,p] = sort(x);
 u = cumsum([1;diff(x)~=0]);
 t = cumsum(accumarray(s,1));
 A = sparse(u,t(p),1,u(end),n);
 A = triu(full(A'*A),1);

  I forgot to mention that this code worked with my tests even with some of the cell vectors empty.

  It is possible the 'unique' operation in your second line might be avoided by suitably altering the subsequent algorithm so as to achieve the same effect without the extra expense of the sorting involved in 'unique'. Ideally, the one massive sort ought to do the whole job with the right procedure. Any duplicate copies within a cell vector will appear in a contiguous string within x and there ought to be a way of excluding them from the count if they are treated the right way. However, I haven't had time to look into that aspect of it.

Roger Stafford

Subject: vectorize intersect with cells?

From: HenryW

Date: 9 Mar, 2009 20:40:17

Message: 12 of 13

Rogers algorithm was really fast --- thanks!!
But I could not get Bruno's last modification to work.

Subject: vectorize intersect with cells?

From: Bruno Luong

Date: 9 Mar, 2009 21:18:01

Message: 13 of 13

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gp3ujh$9k3$1@fred.mathworks.com>...

>
> It is possible the 'unique' operation in your second line might be avoided by suitably altering the subsequent algorithm so as to achieve the same effect without the extra expense of the sorting involved in 'unique'. Ideally, the one massive sort ought to do the whole job with the right procedure. Any duplicate copies within a cell vector will appear in a contiguous string within x and there ought to be a way of excluding them from the count if they are treated the right way. However, I haven't had time to look into that aspect of it.
>

Roger,

Do you think something along this line? Now we have just a unique call of unique!

function A = Engine6(c) % Roger & Bruno
% c must be a list of *row* vectors
[dummy dummy cmap]=unique([c{:}]);
nele = cellfun(@length,c(:));
setid = cumsum(accumarray(cumsum([1; nele]),1));
A = sparse(setid(1:end-1),cmap(:),1,length(c),length(dummy));
A = spones(A);
A = triu(full(A*A.'),1);
end % Engine 6

It's run about 10% faster on my computer for n=1000, and average 50 elements for each set.

Bruno

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us