Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: how to count the number of unique elements elegantly?
Date: Thu, 2 Jun 2011 16:15:21 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 25
Message-ID: <is8cup$26v$1@newscl01ah.mathworks.com>
References: <is8531$5br$1@newscl01ah.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: www-01-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: newscl01ah.mathworks.com 1307031321 2271 172.30.248.46 (2 Jun 2011 16:15:21 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 2 Jun 2011 16:15:21 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1187260
Xref: news.mathworks.com comp.soft-sys.matlab:729890

"Xuefei Cao" <sophie.c1325@hotmail.com> wrote in message <is8531$5br$1@newscl01ah.mathworks.com>...
> Hi there,
> 
> I have a document with words index like [1 2 3 6 8 9 510 236 2 5 9 1 8 58....]. Now I want to calculate the unique word index in the document, and the count for each word index. I do it as follows:
> 
> % document=[1 2 3 6 8 9 510 236 2 5 9 1 8 58....]
> word.id=unique(document);
> for i=1:length(word.id)
>      word.cnt(i)=sum(document==word.id(i));
> end
> 
> I don't like the solution because it uses "for". Is there a method that we can make the calculation more elegant?
> 
> Thanks a lot!
- - - - - - - - -
  You might try this:

 s = sort(document);
 p = find([true,diff(s)~=0,true]);
 word.id = s(p(1:end-1));
 word.cnt = diff(p);

  It isn't the for-loop per se that is inefficient with your counting scheme; it is the algorithm within it that is the source of the trouble.  For each of the N unique elements in the document, you have to scan through all M document elements to answer the logical test "document==word.id(i)" which makes it an order O(N*M) algorithm.  That can be expensive time-wise for large N and M.

Roger Stafford