3.71875

3.7 | 32 ratings Rate this file 352 downloads (last 30 days) File Size: 2 MB File ID: #7486

Clustering Toolbox

by Janos Abonyi

 

20 Apr 2005 (Updated 21 Apr 2005)

No BSD License  

The toolbox provides four categories of functions.

Download Now | Watch this File

File Information
Description

The purpose of the development of this toolbox was to compile a continuously extensible, standard tool, which is useful for any MATLAB user for one's aim. In Chapter 1 of the downloadable related documentation one can find a theoretical introduction containing the theory of the algorithms, the definition of the validity measures and the tools of visualization, which help to understand the programmed MATLAB files.

Chapter 2 deals with the exposition of the
files and the description of the particular algorithms, and they are illustrated with simple examples, while in Chapter 3 the whole
Toolbox is tested on real data sets during the solution of three clustering problems: comparison and selection of algorithms; estimating the optimal number of clusters; and examining
multidimensional data sets.

About the Toolbox

The Fuzzy Clustering and Data Analysis Toolbox is a collection of MATLAB functions. The toolbox provides five categories of functions:

- Clustering algorithms. These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid
are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.

- Evaluation with cluster prototypes. On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2-dimensional case the functions draw a contour-map in the data space to visualize
the results.

- Validation. The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).

- Visualization. The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a
multidimensional scaling method described by Sammon.

- Examples. An example based on industrial data set to present the usefulness of these toolbox and algorithms.

MATLAB release MATLAB 7.0.1 (R14SP1)
Zip File Content  
Other Files
Demos/clusteringexamples/motorcycle/FCMcall.m,
Demos/clusteringexamples/motorcycle/GGcall.m,
Demos/clusteringexamples/motorcycle/GKcall.m,
Demos/clusteringexamples/motorcycle/Kmeanscall.m,
Demos/clusteringexamples/motorcycle/Kmedoidcall.m,
Demos/clusteringexamples/motorcycle/MotorCycle.txt,
Demos/clusteringexamples/synthetic/FCMcall.m,
Demos/clusteringexamples/synthetic/GGcall.m,
Demos/clusteringexamples/synthetic/GKcall.m,
Demos/clusteringexamples/synthetic/Kmeanscall.m,
Demos/clusteringexamples/synthetic/Kmedoidcall.m,
Demos/clusteringexamples/synthetic/nDexample.m,
Demos/clustevalexample/data2.txt,
Demos/clustevalexample/evalexample.m,
Demos/comparing/FCMcall.m,
Demos/comparing/GGcall.m,
Demos/comparing/GKcall.m,
Demos/comparing/Kmeanscall.m,
Demos/comparing/Kmedoidcall.m,
Demos/comparing/modvalidity.m,
Demos/comparing/nDexample.m,
Demos/normexample/data3.txt,
Demos/normexample/normexample.m,
Demos/optnumber/modvalidity.m,
Demos/optnumber/MotorCycle.txt,
Demos/optnumber/optnumber.m,
Demos/PCAexample/nDexample.m,
Demos/PCAexample/PCAexample.m,
Demos/projection/IRIS.MAT,
Demos/projection/visual_call.m,
Demos/projection/WINEDAT.TXT,
Demos/projection/wisconsin.wk1,
FUZZCLUST/clust_denormalize.m,
FUZZCLUST/clust_normalize.m,
FUZZCLUST/clusteval.m,
FUZZCLUST/FCMclust.m,
FUZZCLUST/FuzSam.m,
FUZZCLUST/GGclust.m,
FUZZCLUST/GKclust.m,
FUZZCLUST/Kmeans.m,
FUZZCLUST/Kmedoid.m,
FUZZCLUST/nDexample.m,
FUZZCLUST/PCA.m,
FUZZCLUST/PROJEVAL.M,
FUZZCLUST/SAMMON.M,
FUZZCLUST/SAMSTR.M,
FUZZCLUST/validity.m,
FuzzyClusteringToolbox.pdf
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (41)
19 May 2005 M C

Outstanding. Comes with a 77 page pdf file describing the toolkit.

16 Jun 2005 Pierluigi Cera

The MATLAB code is not commented.
Also, it's difficult to use only some part of the toolbox. For istance, I need only the validation part, but I have to change the code to use it.
More, there isn't compatibily with the MATLAB clustering function. Why the kmeans code is completely different from the MATLAB kmeans function?

16 Jun 2005 LRM 1138

I like the documentation of the theory, but many other things need work. Installation doesn't work as described; you have to add the FUZZCLUST folder to the MATLAB path. I agree with Pierluigi that the functions themselves are very difficult to understand due to lack of comments. The authors should follow the example of MATLAB's kmeans (in the Stats Toolbox) in terms of documentation, and maybe even input parameters. Using structures for everything is neat, but it's just not intuitive (at least for me).

27 Jun 2005 Rossaro Bruno

I am interested in a Matlab version of FORTRAN written TWINSPAN

06 Nov 2005 Stanislava Plichta

The Code is Horible

07 Nov 2005 Jason Choo

I'm still reviewing this toolbox but I want to express my appreciation of Balazs Feil.

From my understanding, he's one of the authors of the .pdf documentation.

He has gone out of his way to help me understand the toolbox. Thank you Balazs!

31 Jan 2006 yuan decheng  
24 Mar 2006 Rosina Kharal

I think the code is just fine. A very helpful toolbox- thank-you !

22 May 2006 Gilles Criton

the code is very good, so...
thank you very much

07 Jul 2006 Lee Ma

ok!

24 Jul 2006 BELGACEM Mouna

Your toolbox was useful for me, but the structur that you use is different vs the habitual structurs, i think thant it will be good if you have a file that make this structurs.

28 Sep 2006 Manjunath Shantharamu

It helped me a lot, Thank you friend

08 Dec 2006 Alaa Elsayad  
02 Apr 2007 zim zim

This toolbox is a wonderful work and It help me greatly. Better will it be if the codes have more clean comments then the users need not turn to look at the pdf file.

19 Apr 2007 cao minghua

Thank you very much

20 Apr 2007 Keerthi Kamal Adusumilli

Very Handy

27 Apr 2007 Jiang Wei  
30 May 2007 abolfazl mahmoodnia

Good toolbox

04 Jun 2007 Hsieh kevin

the toolbox is very helpful for me to do something about clustering knowledge

04 Jun 2007 Gokhan Bilgin

Shows art of clustering. Thanks

08 Jul 2007 Roohullah Amiri

the toolbax is good and helpful

13 Jul 2007 Cenk Budayan

i can't run the toolbox, the help document doesn't tell about using the toolbox. I always get error messages when I trying the demos. Demos don't work. I spend a lot of time for understanding the toolbox, i only understand that i waste my time.

22 Jul 2007 hadaf shegaft

I need a fuzzy connectivity toolbox,i can't find anything.help me plz!

14 Oct 2007 ihsa ul haq

This toolbox works very well.
Thank you for uploading nice work.

08 Feb 2008 Andrés Vega  
25 Mar 2008 sun hui

I need study this program.

05 May 2008 Lina Mohn

The idea of toolbox is great, but implementation needs improvement!There is no comments in the code, so it is very difficult to understand what actually goes on. Also there is no error handling, I got a lot of arrors while using this toolbox, so I have to manually insert some parts of code to capture errors

20 May 2008 Roman Shapov

Have you tested your k-medoids algorithm implementation on the data consisting of a small amount of vectors? It fails sometimes due to uncorrect random initial points choise

13 Aug 2008 orhan alp çetin

Thank you for this useful toolbox.

26 Sep 2008 assoora E

This could be used in which MatLab version, please?

11 Nov 2008 Wadhah Almansoori

Please help,
How can I display which items/instances belong to which cluster? In other words, I want to view the contents of each cluster (after the clustering result) to make my further analysis.
Thanks,

06 Apr 2009 David

Excellent job. Thanks

24 Aug 2009 ali

how can ı olny use Kmedoid pls somebody help me plss........................

20 Nov 2009 Mohammed El-Said

Great job;
It realy helped me,
Thanks.

01 Feb 2010 Ketan

The Kmeans function will occasionally error due to initialization of cluster centers. It tries to access a point outside the data matrix.

05 Feb 2010 muk

nice job! thanks

05 Feb 2010 muk  
27 Feb 2010 ali

I wanna use Kmedoid.m file from there but there is something happen that ı dont want

Kmedoid algoritm is works for good with high dimensional datas for example row number bigger than column number etc. iris (150*4 like this)

but if data set has colums that are bigger than row like bbc(2225 row 9635 column) then

it gives error like that..

Undefined function or variable "distout".

Error in ==> Kmedoid at 81
result.data.d = distout;

Error in ==> noinidir>kume_Callback at 575
       Kmedoid(data,param);

Error in ==> gui_mainfcn at 96
       feval(varargin{:});

Error in ==> noinidir at 42
   gui_mainfcn(gui_State, varargin{:});

Error in ==>
@(hObject,eventdata)noinidir('kume_Callback',hObject,eventdata,guidata(hObje ct))

??? Error while evaluating uicontrol Callback

ı found this ı knoıw why it was doing like that but ı can't solve this

first initialization for v and v0

so while prod(max(abs(v - v0))) result is 0

and it doesnt cluster

pls help me
and it is so urgent

thanks for attention

sincerely

kazım yıldız

02 Mar 2010 ali

at the initilialization stage KMedoid dosent work for large data sets for example bbc data set that has 5class

gives error this:
ndefined function or variable "distout".

Error in ==> Kmedoid at 83
result.data.d = distout;

because ıt initialize but while doesnt work

pls help

05 Mar 2010 Alex

Good start, more work needed to expand initialization options and add more cutting edge clustering methods.

06 Mar 2010 John D'Errico

Oh, this could be SOOOOOO much better. The PDF documentation is quite useful, but even that is lacking. One should not be forced to read through 77 pages of PDF just to use these tools!

This toolbox uses structures extensively. But, to be honest, it feels like they overused them. Like a child with a fun toy, the author went overboard here.

For example, the Kmeans function has two arguments. You pass in the data, and a starting matrix for the centers. So then why must you pass in the data as a structure data.X, and the centers as a structure param.c?

This forces the user to create structures with named fields. Get the field name wrong, and the code fails. How does it fail? Here are the very first lines of Kmeans.m.

======================================================
%checking the parameters given
%if exist('param.c') == 1, c = param.c;else error('Nincs megadva a c, es ez baj...');end;
%if exist('param.vis')~=1, param.vis=0;end;
======================================================

Ok, so there is error checking of a sort. It even returns an error. Sorry, but I'm not impressed with the readability of that error message.

Note that there is no actual help in ANY of the functions in this toolbox. Yes, you can learn a lot after you read all 77 pages of PDF. It might even be worth the wait.

The funny thing is, the authors did write SOME help in the code. Look at the beginning of sammon.m

======================================================
function result = Sammon(proj,data,result,param)
%function P = sammon(D, P, varargin)

%SAMMON Computes Sammon's mapping of a data set.

% Input and output arguments ([]'s are optional):
% D (matrix) size dlen x dim, data to be projected
% (struct) data or map struct
% P (scalar) output dimension
% (matrix) size dlen x odim, initial projection matrix

...
======================================================

The author apparently never learned to write it so that matlab could make use of it. The very first block of contiguous comments in a matlab function is the help block. This is what help returns when you use the help command on it.

How about the code itself? Sadly, the code is not really professionally written at all. I'll disregard the simplistic scheme used to optimize in the Kmeans algorithm I looked at. The code is a bit amateurish in other respects, perhaps something at a level a grad student would write who was just learning MATLAB and knew little about numerical methods. Look at this block of code as an example from Kmeans. Here, c is the number of clusters and N is the number of data points. First, the code loops to do an operation that could be written in a fully vectorized form, far more efficiently in this case.

======================================================
     %Calculating cluster centers
      for i = 1:c
         index=find(label == i);
         if ~isempty(index)
             v(i,:) = mean(X(index,:));
         else
             ind=round(rand*N-1);
             v(i,:)=X(ind,:);
         end
         f0(index,i)=1;
     end
======================================================

As interesting, look at the line where ind is computed. See that ind can easily be 0 or -1. Then ind is used to index into an array!

======================================================
             ind=round(rand*N-1);
             v(i,:)=X(ind,:);
======================================================

The above problem will cause a crash at times. It is almost as bad if the author had written it like this:

======================================================
             ind=round(rand*N+1);
             v(i,:)=X(ind,:);
======================================================

See that I've added 1, not subtracted 1. This is probably what the author wanted to do here, as it will then not cause an error. It makes more sense anyway, but not perfect sense. See that round(rand*N + 1) does not generate uniformly distributed integers!

Try this in matlab:

hist(round(rand(10000,1)*10 + 1),100)

See that the first and last bins are under-represented. Instead, try this:

hist(floor(rand(10000,1)*10 + 1),100)

Now all the bins have equal frequencies. I found this bug in only a few seconds of reading through the code. I'll bet you any amount of money there are other bugs to be found.

While I would love to say that this is a great toolbox, I can't. It could be pretty good, even great if it were more carefully written without bugs, if it had help, if it had errors written in English. The pdf file is great. As it is, the many problems reduce my assessment to 2 stars. If you are willing to repair the bugs, to read through the pdf file, you might even be able to give this a high rating. I would be happy to upgrade my rating if the many problems were repaired.

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
statistics Janos Abonyi 22 Oct 2008 07:46:24
probability Janos Abonyi 22 Oct 2008 07:46:24
clustering Janos Abonyi 22 Oct 2008 07:46:24
em Janos Abonyi 22 Oct 2008 07:46:24
fuzzy clustering Janos Abonyi 22 Oct 2008 07:46:24
cluster validity Janos Abonyi 22 Oct 2008 07:46:24
gathgeva Janos Abonyi 22 Oct 2008 07:46:24
gustaf Janos Abonyi 22 Oct 2008 07:46:24
fuzzy clustering plata luna 11 Dec 2008 16:47:40
clustering Eamonn 04 Aug 2009 11:01:45
fuzzy clustering Eamonn 04 Aug 2009 11:01:47
clustering zhang xin 21 Sep 2009 04:49:57
cluster validity Net Engr 07 Oct 2009 05:04:17
clustering Net Engr 07 Oct 2009 05:04:29
cluster validity Cagri 22 Jan 2010 14:07:56
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com