Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
how to count the number of unique elements elegantly?

Subject: how to count the number of unique elements elegantly?

From: Xuefei Cao

Date: 2 Jun, 2011 14:01:05

Message: 1 of 6

Hi there,

I have a document with words index like [1 2 3 6 8 9 510 236 2 5 9 1 8 58....]. Now I want to calculate the unique word index in the document, and the count for each word index. I do it as follows:

% document=[1 2 3 6 8 9 510 236 2 5 9 1 8 58....]
word.id=unique(document);
for i=1:length(word.id)
     word.cnt(i)=sum(document==word.id(i));
end

I don't like the solution because it uses "for". Is there a method that we can make the calculation more elegant?

Thanks a lot!

Subject: how to count the number of unique elements elegantly?

From: ImageAnalyst

Date: 2 Jun, 2011 15:05:00

Message: 2 of 6

On Jun 2, 10:01 am, "Xuefei Cao" <sophie.c1...@hotmail.com> wrote:
> Hi there,
>
> I have a document with words index like [1 2 3 6 8 9 510 236 2 5 9 1 8 58....]. Now I want to calculate the unique word index in the document, and the count for each word index. I do it as follows:
>
> % document=[1 2 3 6 8 9 510 236 2 5 9 1 8 58....]
> word.id=unique(document);
> for i=1:length(word.id)
>      word.cnt(i)=sum(document==word.id(i));
> end
>
> I don't like the solution because it uses "for". Is there a method that we can make the calculation more elegant?
>
> Thanks a lot!

---------------------------------------------------------------------------------------
You mean like this?:

counts = hist(document, max(document))

Subject: how to count the number of unique elements elegantly?

From: Roger Stafford

Date: 2 Jun, 2011 16:15:21

Message: 3 of 6

"Xuefei Cao" <sophie.c1325@hotmail.com> wrote in message <is8531$5br$1@newscl01ah.mathworks.com>...
> Hi there,
>
> I have a document with words index like [1 2 3 6 8 9 510 236 2 5 9 1 8 58....]. Now I want to calculate the unique word index in the document, and the count for each word index. I do it as follows:
>
> % document=[1 2 3 6 8 9 510 236 2 5 9 1 8 58....]
> word.id=unique(document);
> for i=1:length(word.id)
> word.cnt(i)=sum(document==word.id(i));
> end
>
> I don't like the solution because it uses "for". Is there a method that we can make the calculation more elegant?
>
> Thanks a lot!
- - - - - - - - -
  You might try this:

 s = sort(document);
 p = find([true,diff(s)~=0,true]);
 word.id = s(p(1:end-1));
 word.cnt = diff(p);

  It isn't the for-loop per se that is inefficient with your counting scheme; it is the algorithm within it that is the source of the trouble. For each of the N unique elements in the document, you have to scan through all M document elements to answer the logical test "document==word.id(i)" which makes it an order O(N*M) algorithm. That can be expensive time-wise for large N and M.

Roger Stafford

Subject: how to count the number of unique elements elegantly?

From: Xuefei Cao

Date: 2 Jun, 2011 19:08:20

Message: 4 of 6

Yes, it's cool~~

Then could I ask for a little further help on the opposite problem? That is, what if we have word.id=[1 2 3 5], and word.cnt=[2 2 3 1], and want to recover the full-length document as document=[1 1 2 2 3 3 3 5] ? The word order doesn't matter here.

My poor solution is:
document=[];
for i=1:length(word.id)
    document=horzcat(document,word.id(i)*ones(1,word.cnt(i)));
end

Can we have an elegant solution for this?





ImageAnalyst <imageanalyst@mailinator.com> wrote in message
> ---------------------------------------------------------------------------------------
> You mean like this?:
>
> counts = hist(document, max(document))

Subject: how to count the number of unique elements elegantly?

From: Roger Stafford

Date: 2 Jun, 2011 20:14:20

Message: 5 of 6

"Xuefei Cao" <sophie.c1325@hotmail.com> wrote in message <is8n34$4r8$1@newscl01ah.mathworks.com>...
> Then could I ask for a little further help on the opposite problem? That is, what if we have word.id=[1 2 3 5], and word.cnt=[2 2 3 1], and want to recover the full-length document as document=[1 1 2 2 3 3 3 5] ? The word order doesn't matter here.
- - - - - - - - - -
  Starting with just 'word.id' and 'word.cnt':

 q = cumsum([1,word.cnt]);
 p = zeros(1,q(end)-1);
 p(q(1:end-1)) = 1;
 d = word.id(cumsum(p));

where 'd' is the sorted version of the original 'document'. There is no way of recovering 'document' itself without more information.

Roger Stafford

Subject: how to count the number of unique elements elegantly?

From: Xuefei Cao

Date: 9 Jun, 2011 01:34:04

Message: 6 of 6

Thanks everyone, I'm helped and appreciate ur helps~~

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us