The purpose of the development of this toolbox was to compile a continuously extensible, standard tool, which is useful for any MATLAB user for one's aim. In Chapter 1 of the downloadable related documentation one can find a theoretical introduction containing the theory of the algorithms, the definition of the validity measures and the tools of visualization, which help to understand the programmed MATLAB files.
Chapter 2 deals with the exposition of the
files and the description of the particular algorithms, and they are illustrated with simple examples, while in Chapter 3 the whole
Toolbox is tested on real data sets during the solution of three clustering problems: comparison and selection of algorithms; estimating the optimal number of clusters; and examining
multidimensional data sets.
About the Toolbox
The Fuzzy Clustering and Data Analysis Toolbox is a collection of MATLAB functions. The toolbox provides five categories of functions:
- Clustering algorithms. These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid
are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.
- Evaluation with cluster prototypes. On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2-dimensional case the functions draw a contour-map in the data space to visualize
- Validation. The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).
- Visualization. The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a
multidimensional scaling method described by Sammon.
- Examples. An example based on industrial data set to present the usefulness of these toolbox and algorithms.
I have just install toolbox. However, I am getting an error when I try to run motorcycle clustering example in demo file. This error is
Undefined function 'isnan' for input arguments of type 'struct'.
Error in internal.stats.removenan (line 54)
wasnan = wasnan | any(isnan(y),2);
Error in statremovenan (line 7)
Error in kmeans (line 141)
[~,wasnan,X] = statremovenan(X);
Error in Kmeanscall (line 21)
Did anyone have same problem? If so, how can I solve this problem?
Hi, Thanks for the tool and comprehensive help document.
I have a question about the "param.c" parameter. It seems that the number of clusters is a pre-defined input for all clustering methods. What if that info is not know? My understanding is using a method like Fuzzy Subtractive Clustering in one way to approach clustering in the absence of param.c. Does any of your clustering function run have options for no param.c condition?
Hussain Here, I installed the fuzzy Clustering Tool Box, but the tool box is not working well, it shows the error (Error using FCMclust (line 8)
Not enough input arguments.
kindly help to solve my problem,
i need CO and COr index. Who can help me?
Has anyone tried to change the fuzziness of the membership function ?
i tried but it doesnt take effect in the FCMclust (it always claculate for m=2)
Check page 32 of the documentation. You can try FCMcall functions in
These calls should work...
You can check there, that param should be a structure not a simple number, like here:
result = FCMclust(data,param);
I hope it helps.
I installed this toolbox, but its dont accept the FCMclust command . Like that: X = FCMclust(data,3)
I didnt understand because kmeans method works perfectly.
Another question: in this command we can set tree parameters for Fuzzy partitioning. m (weight exponent) c (number of clusters) and e (maximum termination tolerance).
How could i set this parameter on the command?
Anyone can help me?
I don't think he is going to be updating this ever. I looked him up and he is now a professor / vice dean for education and accreditation in the Department of chemical and process engineering at the University of Pannonia in Hungary. He has no publications in the last 9 years.
I have managed to fix a 'bug' in Kmedioid.m, this might help someone some day so here goes:
In line 22 the original code is:
| while prod(max(abs(v - v0))),
This code goes to 0 due to floating point roundoffs quickly, especially for large normalized distance matrices.
| while ~isinf(sum(log10(max(abs(v-v0))))),
Hope this helps someone.
I am running k-mediod. What does x and y-axis signify ? they vary from 0 - 1.
Nice code and thanks for making all codes open source.
how can i use this toolbox with images?
and how i can display the clustered image after every iteration, the only matrix that is returned is the Nxc where c is the num of clusters i'm just a beginner and i hoped that this tool box would make me find my way in that subject. please i need an answer i see that the owner of this file doesnt answer the questions left in here
Oh, this could be SOOOOOO much better. The PDF documentation is quite useful, but even that is lacking. One should not be forced to read through 77 pages of PDF just to use these tools!
This toolbox uses structures extensively. But, to be honest, it feels like they overused them. Like a child with a fun toy, the author went overboard here.
For example, the Kmeans function has two arguments. You pass in the data, and a starting matrix for the centers. So then why must you pass in the data as a structure data.X, and the centers as a structure param.c?
This forces the user to create structures with named fields. Get the field name wrong, and the code fails. How does it fail? Here are the very first lines of Kmeans.m.
%checking the parameters given
%if exist('param.c') == 1, c = param.c;else error('Nincs megadva a c, es ez baj...');end;
%if exist('param.vis')~=1, param.vis=0;end;
Ok, so there is error checking of a sort. It even returns an error. Sorry, but I'm not impressed with the readability of that error message.
Note that there is no actual help in ANY of the functions in this toolbox. Yes, you can learn a lot after you read all 77 pages of PDF. It might even be worth the wait.
The funny thing is, the authors did write SOME help in the code. Look at the beginning of sammon.m
function result = Sammon(proj,data,result,param)
%function P = sammon(D, P, varargin)
%SAMMON Computes Sammon's mapping of a data set.
% Input and output arguments ('s are optional):
% D (matrix) size dlen x dim, data to be projected
% (struct) data or map struct
% P (scalar) output dimension
% (matrix) size dlen x odim, initial projection matrix
The author apparently never learned to write it so that matlab could make use of it. The very first block of contiguous comments in a matlab function is the help block. This is what help returns when you use the help command on it.
How about the code itself? Sadly, the code is not really professionally written at all. I'll disregard the simplistic scheme used to optimize in the Kmeans algorithm I looked at. The code is a bit amateurish in other respects, perhaps something at a level a grad student would write who was just learning MATLAB and knew little about numerical methods. Look at this block of code as an example from Kmeans. Here, c is the number of clusters and N is the number of data points. First, the code loops to do an operation that could be written in a fully vectorized form, far more efficiently in this case.
%Calculating cluster centers
for i = 1:c
index=find(label == i);
v(i,:) = mean(X(index,:));
As interesting, look at the line where ind is computed. See that ind can easily be 0 or -1. Then ind is used to index into an array!
The above problem will cause a crash at times. It is almost as bad if the author had written it like this:
See that I've added 1, not subtracted 1. This is probably what the author wanted to do here, as it will then not cause an error. It makes more sense anyway, but not perfect sense. See that round(rand*N + 1) does not generate uniformly distributed integers!
Try this in matlab:
hist(round(rand(10000,1)*10 + 1),100)
See that the first and last bins are under-represented. Instead, try this:
hist(floor(rand(10000,1)*10 + 1),100)
Now all the bins have equal frequencies. I found this bug in only a few seconds of reading through the code. I'll bet you any amount of money there are other bugs to be found.
While I would love to say that this is a great toolbox, I can't. It could be pretty good, even great if it were more carefully written without bugs, if it had help, if it had errors written in English. The pdf file is great. As it is, the many problems reduce my assessment to 2 stars. If you are willing to repair the bugs, to read through the pdf file, you might even be able to give this a high rating. I would be happy to upgrade my rating if the many problems were repaired.
Good start, more work needed to expand initialization options and add more cutting edge clustering methods.
at the initilialization stage KMedoid dosent work for large data sets for example bbc data set that has 5class
gives error this:
ndefined function or variable "distout".
Error in ==> Kmedoid at 83
result.data.d = distout;
because ıt initialize but while doesnt work
I wanna use Kmedoid.m file from there but there is something happen that ı dont want
Kmedoid algoritm is works for good with high dimensional datas for example row number bigger than column number etc. iris (150*4 like this)
but if data set has colums that are bigger than row like bbc(2225 row 9635 column) then
it gives error like that..
Undefined function or variable "distout".
Error in ==> Kmedoid at 81
result.data.d = distout;
Error in ==> noinidir>kume_Callback at 575
Error in ==> gui_mainfcn at 96
Error in ==> noinidir at 42
Error in ==>
??? Error while evaluating uicontrol Callback
ı found this ı knoıw why it was doing like that but ı can't solve this
first initialization for v and v0
so while prod(max(abs(v - v0))) result is 0
and it doesnt cluster
pls help me
and it is so urgent
thanks for attention
nice job! thanks
The Kmeans function will occasionally error due to initialization of cluster centers. It tries to access a point outside the data matrix.
It realy helped me,
how can ı olny use Kmedoid pls somebody help me plss........................
Excellent job. Thanks
How can I display which items/instances belong to which cluster? In other words, I want to view the contents of each cluster (after the clustering result) to make my further analysis.
This could be used in which MatLab version, please?
Thank you for this useful toolbox.
Have you tested your k-medoids algorithm implementation on the data consisting of a small amount of vectors? It fails sometimes due to uncorrect random initial points choise
The idea of toolbox is great, but implementation needs improvement!There is no comments in the code, so it is very difficult to understand what actually goes on. Also there is no error handling, I got a lot of arrors while using this toolbox, so I have to manually insert some parts of code to capture errors
I need study this program.
This toolbox works very well.
Thank you for uploading nice work.
I need a fuzzy connectivity toolbox,i can't find anything.help me plz!
i can't run the toolbox, the help document doesn't tell about using the toolbox. I always get error messages when I trying the demos. Demos don't work. I spend a lot of time for understanding the toolbox, i only understand that i waste my time.
the toolbax is good and helpful
Shows art of clustering. Thanks
the toolbox is very helpful for me to do something about clustering knowledge
Thank you very much
This toolbox is a wonderful work and It help me greatly. Better will it be if the codes have more clean comments then the users need not turn to look at the pdf file.
It helped me a lot, Thank you friend
Your toolbox was useful for me, but the structur that you use is different vs the habitual structurs, i think thant it will be good if you have a file that make this structurs.
the code is very good, so...
thank you very much
I think the code is just fine. A very helpful toolbox- thank-you !
I'm still reviewing this toolbox but I want to express my appreciation of Balazs Feil.
From my understanding, he's one of the authors of the .pdf documentation.
He has gone out of his way to help me understand the toolbox. Thank you Balazs!
The Code is Horible
I am interested in a Matlab version of FORTRAN written TWINSPAN
I like the documentation of the theory, but many other things need work. Installation doesn't work as described; you have to add the FUZZCLUST folder to the MATLAB path. I agree with Pierluigi that the functions themselves are very difficult to understand due to lack of comments. The authors should follow the example of MATLAB's kmeans (in the Stats Toolbox) in terms of documentation, and maybe even input parameters. Using structures for everything is neat, but it's just not intuitive (at least for me).
The MATLAB code is not commented.
Also, it's difficult to use only some part of the toolbox. For istance, I need only the validation part, but I have to change the code to use it.
More, there isn't compatibily with the MATLAB clustering function. Why the kmeans code is completely different from the MATLAB kmeans function?
Outstanding. Comes with a 77 page pdf file describing the toolkit.
Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.