Code covered by the BSD License  

Highlights from
Number of classes

Be the first to rate this file! 2 Downloads (last 30 days) File Size: 1.34 KB File ID: #26435

Number of classes

by Zacharias Voulgaris

 

20 Jan 2010

A practical tool for finding quickly the classes of a dataset based on the labels array.

| Watch this File

File Information
Description

This program was developed to facilitate the analysis of datasets and the development of algorithms in the field of pattern classification. It is significantly faster than the relevant built-in function of Matlab (unique.m) plus it provides some basic details of the classes, such as the indexes of their members and their sizes.

Usage:
[q Q C N] = nc( O )
where O is the label array (vector)
            q is the number of classes (integer)
            Q is the class labels (vector)
            C is the class members' indexes (cell)
            N is the number of points in each class (int.)

Any feedback on this file, esp. regarding how it can be improved, would be greatly appreciated.

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (3)
21 Jan 2010 Rob Campbell

I will hold off a rating for now...
Nice idea but I have some comments:

- Code is not commented and is rather long for what it does. You could do 90% of this using a call to unique and call to hist (2 lines!). Using your output labels:
q=length(unique(A));
Q=unique(A);
Then:
N=hist(A,unique(A)); %the number in each class
The indecies for each class is the 3rd output of unique. You can either use that as is or turn it into a cell array as you have (C).

- Your code doesn't seem faster to me, it seems about 7 times slower (although without making C, but C may not be necessary). What am I missing?

r=round(randn(1,10e5)*10);
>> tic,[b,i,j]=unique(r);a=hist(r,b);toc
Elapsed time is 0.570190 seconds.
>> tic,nc(r);toc
Elapsed time is 3.543892 seconds.

21 Jan 2010 Zacharias Voulgaris

Thank you for your feedback.

It is true that there are different ways of going about this problem, one of which is using the unique function. However, the whole idea of developing the nc program was to avoid the "unique" function as it is slower. Note that the speed of a function is more accurately measured via the profiler program of Matlab, or through the cputime function.

21 Jan 2010 Rob Campbell

Respectfully, I must disagree. Firstly, the Mathworks help makes it clear that tic,toc is to be preferred over cputime. In addition tic, toc provides the same answer as "total time" of the profiler.

I am not sure why you say unique is slower. I just re-wrote your function using it and get a 3x speed up over your code (8s compared to 25s for a vector of length 1e7). The code is much neater and easier to understand and provides the same output:

function [q,Q,C,N] = nc2(O)
Q=unique(O);
q=length(Q);
N=hist(O,Q);

for k=1:length(Q);
    C{k}=find(O==Q(k));
end

As I said earlier, please do correct me if I'm missing something silly. I'm not trying to make pedantic criticisms, just to help!

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
unique values Zacharias Voulgaris 21 Jan 2010 10:00:33
class labels Zacharias Voulgaris 21 Jan 2010 10:00:33
class vector Zacharias Voulgaris 21 Jan 2010 10:00:33

Contact us at files@mathworks.com