Highlights from Clustering Toolbox

3.71795
3.7 | 39 ratings Rate this file 117 Downloads (last 30 days) File Size: 2 MB File ID: #7486 Version: 1.0

Clustering Toolbox

Janos Abonyi (view profile)

20 Apr 2005 (Updated )

The toolbox provides four categories of functions.

File Information
Description

The purpose of the development of this toolbox was to compile a continuously extensible, standard tool, which is useful for any MATLAB user for one's aim. In Chapter 1 of the downloadable related documentation one can find a theoretical introduction containing the theory of the algorithms, the definition of the validity measures and the tools of visualization, which help to understand the programmed MATLAB files.

Chapter 2 deals with the exposition of the
files and the description of the particular algorithms, and they are illustrated with simple examples, while in Chapter 3 the whole
Toolbox is tested on real data sets during the solution of three clustering problems: comparison and selection of algorithms; estimating the optimal number of clusters; and examining
multidimensional data sets.

The Fuzzy Clustering and Data Analysis Toolbox is a collection of MATLAB functions. The toolbox provides five categories of functions:

- Clustering algorithms. These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid
are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.

- Evaluation with cluster prototypes. On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2-dimensional case the functions draw a contour-map in the data space to visualize
the results.

- Validation. The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).

- Visualization. The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a
multidimensional scaling method described by Sammon.

- Examples. An example based on industrial data set to present the usefulness of these toolbox and algorithms.

MATLAB release MATLAB 7.0.1 (R14SP1)
Tags for This File   Please login to tag files.
Comments and Ratings (57)
30 Apr 2015 Ram

Ram (view profile)

I have just install toolbox. However, I am getting an error when I try to run motorcycle clustering example in demo file. This error is

Undefined function 'isnan' for input arguments of type 'struct'.

Error in internal.stats.removenan (line 54)
wasnan = wasnan | any(isnan(y),2);

Error in statremovenan (line 7)
internal.stats.removenan(varargin{:});

Error in kmeans (line 141)
[~,wasnan,X] = statremovenan(X);

Error in Kmeanscall (line 21)
result=kmeans(data,param);

Did anyone have same problem? If so, how can I solve this problem?

Comment only
08 May 2014 Noushin Farnoud

Noushin Farnoud (view profile)

Hi, Thanks for the tool and comprehensive help document.
I have a question about the "param.c" parameter. It seems that the number of clusters is a pre-defined input for all clustering methods. What if that info is not know? My understanding is using a method like Fuzzy Subtractive Clustering in one way to approach clustering in the absence of param.c. Does any of your clustering function run have options for no param.c condition?
Thank You,
NF

19 Feb 2014 Hussain Shah

Hussain Shah (view profile)

Hi,
Hussain Here, I installed the fuzzy Clustering Tool Box, but the tool box is not working well, it shows the error (Error using FCMclust (line 8)
Not enough input arguments.
f0=param.c;
kindly help to solve my problem,

email: hussainshah84@gmail.com

Comment only
06 Jun 2013 Ta Hai

Ta Hai (view profile)

i need CO and COr index. Who can help me?

Comment only
31 May 2013 Ta Hai

Ta Hai (view profile)

18 Feb 2013 Mohamad

Has anyone tried to change the fuzziness of the membership function ?

i tried but it doesnt take effect in the FCMclust (it always claculate for m=2)

!!

Comment only
20 Nov 2012 András Király

András Király (view profile)

Hi,

Check page 32 of the documentation. You can try FCMcall functions in

Demos\clusteringexamples\synthetic\ and
Demos\clusteringexamples\motorcycle directories.

These calls should work...

You can check there, that param should be a structure not a simple number, like here:

%parameters
param.c=4;
param.m=2;
param.e=1e-6;
param.ro=ones(1,param.c);
param.val=1;
%normalization
data=clust_normalize(data,'range');
%clustering
result = FCMclust(data,param);

I hope it helps.

Comment only
13 Nov 2012 Arthur Calegario

Arthur Calegario (view profile)

Hello,
I installed this toolbox, but its dont accept the FCMclust command . Like that: X = FCMclust(data,3)
I didnt understand because kmeans method works perfectly.
Another question: in this command we can set tree parameters for Fuzzy partitioning. m (weight exponent) c (number of clusters) and e (maximum termination tolerance).
How could i set this parameter on the command?
Anyone can help me?
Thx!! :D

Comment only
13 Apr 2012 zhang zhang

zhang zhang (view profile)

02 Mar 2012 Eric Diaz

Eric Diaz (view profile)

I don't think he is going to be updating this ever. I looked him up and he is now a professor / vice dean for education and accreditation in the Department of chemical and process engineering at the University of Pannonia in Hungary. He has no publications in the last 9 years.

11 Dec 2011 Shay

Shay (view profile)

I have managed to fix a 'bug' in Kmedioid.m, this might help someone some day so here goes:
In line 22 the original code is:
| while prod(max(abs(v - v0))),
This code goes to 0 due to floating point roundoffs quickly, especially for large normalized distance matrices.
My solution:
| while ~isinf(sum(log10(max(abs(v-v0))))),

Hope this helps someone.
Shay

12 Oct 2011 Gautam Thakur

Gautam Thakur (view profile)

I am running k-mediod. What does x and y-axis signify ? they vary from 0 - 1.

10 Mar 2011 Siva kiran R R

Siva kiran R R (view profile)

Nice code and thanks for making all codes open source.

Comment only
01 Nov 2010 TwiTota

TwiTota (view profile)

28 Oct 2010 TwiTota

TwiTota (view profile)

how can i use this toolbox with images?
and how i can display the clustered image after every iteration, the only matrix that is returned is the Nxc where c is the num of clusters i'm just a beginner and i hoped that this tool box would make me find my way in that subject. please i need an answer i see that the owner of this file doesnt answer the questions left in here

thanx

Comment only
12 May 2010 Jun wan

Jun wan (view profile)

excellent!! thanks

Comment only
06 Mar 2010 John D'Errico

John D'Errico (view profile)

Oh, this could be SOOOOOO much better. The PDF documentation is quite useful, but even that is lacking. One should not be forced to read through 77 pages of PDF just to use these tools!

This toolbox uses structures extensively. But, to be honest, it feels like they overused them. Like a child with a fun toy, the author went overboard here.

For example, the Kmeans function has two arguments. You pass in the data, and a starting matrix for the centers. So then why must you pass in the data as a structure data.X, and the centers as a structure param.c?

This forces the user to create structures with named fields. Get the field name wrong, and the code fails. How does it fail? Here are the very first lines of Kmeans.m.

======================================================
%checking the parameters given
%if exist('param.c') == 1, c = param.c;else error('Nincs megadva a c, es ez baj...');end;
%if exist('param.vis')~=1, param.vis=0;end;
======================================================

Ok, so there is error checking of a sort. It even returns an error. Sorry, but I'm not impressed with the readability of that error message.

Note that there is no actual help in ANY of the functions in this toolbox. Yes, you can learn a lot after you read all 77 pages of PDF. It might even be worth the wait.

The funny thing is, the authors did write SOME help in the code. Look at the beginning of sammon.m

======================================================
function result = Sammon(proj,data,result,param)
%function P = sammon(D, P, varargin)

%SAMMON Computes Sammon's mapping of a data set.

% Input and output arguments ([]'s are optional):
% D (matrix) size dlen x dim, data to be projected
% (struct) data or map struct
% P (scalar) output dimension
% (matrix) size dlen x odim, initial projection matrix

...
======================================================

The author apparently never learned to write it so that matlab could make use of it. The very first block of contiguous comments in a matlab function is the help block. This is what help returns when you use the help command on it.

How about the code itself? Sadly, the code is not really professionally written at all. I'll disregard the simplistic scheme used to optimize in the Kmeans algorithm I looked at. The code is a bit amateurish in other respects, perhaps something at a level a grad student would write who was just learning MATLAB and knew little about numerical methods. Look at this block of code as an example from Kmeans. Here, c is the number of clusters and N is the number of data points. First, the code loops to do an operation that could be written in a fully vectorized form, far more efficiently in this case.

======================================================
%Calculating cluster centers
for i = 1:c
index=find(label == i);
if ~isempty(index)
v(i,:) = mean(X(index,:));
else
ind=round(rand*N-1);
v(i,:)=X(ind,:);
end
f0(index,i)=1;
end
======================================================

As interesting, look at the line where ind is computed. See that ind can easily be 0 or -1. Then ind is used to index into an array!

======================================================
ind=round(rand*N-1);
v(i,:)=X(ind,:);
======================================================

The above problem will cause a crash at times. It is almost as bad if the author had written it like this:

======================================================
ind=round(rand*N+1);
v(i,:)=X(ind,:);
======================================================

See that I've added 1, not subtracted 1. This is probably what the author wanted to do here, as it will then not cause an error. It makes more sense anyway, but not perfect sense. See that round(rand*N + 1) does not generate uniformly distributed integers!

Try this in matlab:

hist(round(rand(10000,1)*10 + 1),100)

See that the first and last bins are under-represented. Instead, try this:

hist(floor(rand(10000,1)*10 + 1),100)

Now all the bins have equal frequencies. I found this bug in only a few seconds of reading through the code. I'll bet you any amount of money there are other bugs to be found.

While I would love to say that this is a great toolbox, I can't. It could be pretty good, even great if it were more carefully written without bugs, if it had help, if it had errors written in English. The pdf file is great. As it is, the many problems reduce my assessment to 2 stars. If you are willing to repair the bugs, to read through the pdf file, you might even be able to give this a high rating. I would be happy to upgrade my rating if the many problems were repaired.

05 Mar 2010 Alex

Alex (view profile)

Good start, more work needed to expand initialization options and add more cutting edge clustering methods.

02 Mar 2010 ali

ali (view profile)

at the initilialization stage KMedoid dosent work for large data sets for example bbc data set that has 5class

gives error this:
ndefined function or variable "distout".

Error in ==> Kmedoid at 83
result.data.d = distout;

because ıt initialize but while doesnt work

pls help

Comment only
27 Feb 2010 ali

ali (view profile)

I wanna use Kmedoid.m file from there but there is something happen that ı dont want

Kmedoid algoritm is works for good with high dimensional datas for example row number bigger than column number etc. iris (150*4 like this)

but if data set has colums that are bigger than row like bbc(2225 row 9635 column) then

it gives error like that..

Undefined function or variable "distout".

Error in ==> Kmedoid at 81
result.data.d = distout;

Error in ==> noinidir>kume_Callback at 575
Kmedoid(data,param);

Error in ==> gui_mainfcn at 96
feval(varargin{:});

Error in ==> noinidir at 42
gui_mainfcn(gui_State, varargin{:});

Error in ==>
@(hObject,eventdata)noinidir('kume_Callback',hObject,eventdata,guidata(hObje ct))

??? Error while evaluating uicontrol Callback

ı found this ı knoıw why it was doing like that but ı can't solve this

first initialization for v and v0

so while prod(max(abs(v - v0))) result is 0

and it doesnt cluster

pls help me
and it is so urgent

thanks for attention

sincerely

kazım yıldız

Comment only
05 Feb 2010 muk

05 Feb 2010 muk

muk (view profile)

nice job! thanks

Comment only
01 Feb 2010 Ketan

Ketan (view profile)

The Kmeans function will occasionally error due to initialization of cluster centers. It tries to access a point outside the data matrix.

Comment only
20 Nov 2009 Mohammed El-Said

Mohammed El-Said (view profile)

Great job;
It realy helped me,
Thanks.

24 Aug 2009 ali

ali (view profile)

how can ı olny use Kmedoid pls somebody help me plss........................

Comment only
06 Apr 2009 David

David (view profile)

Excellent job. Thanks

11 Nov 2008 Wadhah Almansoori

Wadhah Almansoori (view profile)

How can I display which items/instances belong to which cluster? In other words, I want to view the contents of each cluster (after the clustering result) to make my further analysis.
Thanks,

26 Sep 2008 assoora E

This could be used in which MatLab version, please?

Comment only
13 Aug 2008 orhan alp çetin

Thank you for this useful toolbox.

20 May 2008 Roman Shapov

Have you tested your k-medoids algorithm implementation on the data consisting of a small amount of vectors? It fails sometimes due to uncorrect random initial points choise

05 May 2008 Lina Mohn

The idea of toolbox is great, but implementation needs improvement!There is no comments in the code, so it is very difficult to understand what actually goes on. Also there is no error handling, I got a lot of arrors while using this toolbox, so I have to manually insert some parts of code to capture errors

25 Mar 2008 sun hui

I need study this program.

Comment only
08 Feb 2008 Andrés Vega
14 Oct 2007 ihsa ul haq

This toolbox works very well.

22 Jul 2007 hadaf shegaft

I need a fuzzy connectivity toolbox,i can't find anything.help me plz!

Comment only
13 Jul 2007 Cenk Budayan

i can't run the toolbox, the help document doesn't tell about using the toolbox. I always get error messages when I trying the demos. Demos don't work. I spend a lot of time for understanding the toolbox, i only understand that i waste my time.

08 Jul 2007 Roohullah Amiri

the toolbax is good and helpful

04 Jun 2007 Gokhan Bilgin

Shows art of clustering. Thanks

04 Jun 2007 Hsieh kevin

the toolbox is very helpful for me to do something about clustering knowledge

30 May 2007 abolfazl mahmoodnia

Good toolbox

27 Apr 2007 Jiang Wei
20 Apr 2007 Keerthi Kamal Adusumilli

Very Handy

19 Apr 2007 cao minghua

Thank you very much

02 Apr 2007 zim zim

This toolbox is a wonderful work and It help me greatly. Better will it be if the codes have more clean comments then the users need not turn to look at the pdf file.

08 Dec 2006 Alaa Elsayad
28 Sep 2006 Manjunath Shantharamu

It helped me a lot, Thank you friend

24 Jul 2006 BELGACEM Mouna

Your toolbox was useful for me, but the structur that you use is different vs the habitual structurs, i think thant it will be good if you have a file that make this structurs.

07 Jul 2006 Lee Ma

ok!

22 May 2006 Gilles Criton

the code is very good, so...
thank you very much

24 Mar 2006 Rosina Kharal

I think the code is just fine. A very helpful toolbox- thank-you !

31 Jan 2006 yuan decheng
07 Nov 2005 Jason Choo

I'm still reviewing this toolbox but I want to express my appreciation of Balazs Feil.

From my understanding, he's one of the authors of the .pdf documentation.

He has gone out of his way to help me understand the toolbox. Thank you Balazs!

Comment only
06 Nov 2005 Stanislava Plichta

The Code is Horible

27 Jun 2005 Rossaro Bruno

I am interested in a Matlab version of FORTRAN written TWINSPAN

16 Jun 2005 LRM 1138

I like the documentation of the theory, but many other things need work. Installation doesn't work as described; you have to add the FUZZCLUST folder to the MATLAB path. I agree with Pierluigi that the functions themselves are very difficult to understand due to lack of comments. The authors should follow the example of MATLAB's kmeans (in the Stats Toolbox) in terms of documentation, and maybe even input parameters. Using structures for everything is neat, but it's just not intuitive (at least for me).

16 Jun 2005 Pierluigi Cera

The MATLAB code is not commented.
Also, it's difficult to use only some part of the toolbox. For istance, I need only the validation part, but I have to change the code to use it.
More, there isn't compatibily with the MATLAB clustering function. Why the kmeans code is completely different from the MATLAB kmeans function?

19 May 2005 M C

Outstanding. Comes with a 77 page pdf file describing the toolkit.