```Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: vectorize intersect with cells?
Date: Mon, 9 Mar 2009 20:36:01 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 37
Message-ID: <gp3ujh\$9k3\$1@fred.mathworks.com>
References: <18230588.1236560716722.JavaMail.jakarta@nitrogen.mathforum.org> <27184211.1236618266940.JavaMail.jakarta@nitrogen.mathforum.org> <gp3ksa\$aak\$1@fred.mathworks.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1236630961 9859 172.30.248.38 (9 Mar 2009 20:36:01 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 9 Mar 2009 20:36:01 +0000 (UTC)
Xref: news.mathworks.com comp.soft-sys.matlab:523595

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <gp3ksa\$aak\$1@fred.mathworks.com>...
> Great algo from Roger, very quick using binary sparse matrix to store set-element relationship. Here is the algo, slightly modified to give the same result as the original one, and vectorized two for-oops:
>
> function A = Engine4(c) % Roger
> n=length(c);
> c = cellfun(@(x) unique(x(:)), c, 'uni', false);
> s = cumsum([1 cellfun(@length,c)]).';
> x=cat(1,c{:});
> m = s(n+1)-1;
> [x,p] = sort(x);
> q = (1:m)';
> q(p) = q;
> u = cumsum([1;diff(x)~=0]);
> t = cumsum(accumarray(s,1));
> A = sparse(u(q),t(1:m),1,u(m),n);
> A = triu(full(A'*A),1);
> end % Engine 4

Thanks, Bruno, for fixing up that code for generating x and s.  I am a "babe in the woods" when it comes to cell arrays.  Also thanks for the correction with 'triu'.

As it turns out, neither that inverse permutation 'q' nor the quantity 'm' were actually needed, so here is how your modification would look without them:

n=length(c);
c = cellfun(@(x) unique(x(:)), c, 'uni', false);
s = cumsum([1 cellfun(@length,c)]).';
x=cat(1,c{:});
[x,p] = sort(x);
u = cumsum([1;diff(x)~=0]);
t = cumsum(accumarray(s,1));
A = sparse(u,t(p),1,u(end),n);
A = triu(full(A'*A),1);

I forgot to mention that this code worked with my tests even with some of the cell vectors empty.

It is possible the 'unique' operation in your second line might be avoided by suitably altering the subsequent algorithm so as to achieve the same effect without the extra expense of the sorting involved in 'unique'.  Ideally, the one massive sort ought to do the whole job with the right procedure.  Any duplicate copies within a cell vector will appear in a contiguous string within x and there ought to be a way of excluding them from the count if they are treated the right way.  However, I haven't had time to look into that aspect of it.

Roger Stafford
```