4.5

4.5 | 3 ratings Rate this file 213 downloads (last 30 days) File Size: 32.64 KB File ID: #24616

kmeans clustering

by Michael Chen

 

01 Jul 2009 (Updated 01 Jul 2009)

Code covered by BSD License  

Fully vectorized kmeans algorithm. Fast yet simple (10 lines)

Download Now | Watch this File

File Information
Description

This is a very fast implementation of the original kmeans clustering algorithm without any fancy acceleration technique, such as kd-tree indexing and triangular inequation. (actually the fastest matlab implementation as far as I can tell.)

This code is as vectorized as possible. Yet it is very compact (only 10 lines of code). It is 10~100 times faster than the kmeans function in matlab.

The package also includes a function for ploting the data with labels.

Sample code:
>> load data;scatterd(X,y)
>> f=litekmeans(X,3);scatterd(X,f)

MATLAB release MATLAB 7.8 (R2009a)
Zip File Content  
Other Files data.mat,
license.txt,
litekmeans.m,
scatterd.m
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (8)
06 Aug 2009 Sven

This gave a simple implementation to the problem I had.
A couple of notes:
- Reduction of code is good, reduction of comments is not. "litekmeans" is without any header line or comments describing what this (very useful) function does, what kind of input it takes, what it produces etc. Things like the dimension to arrange the input points are necessary to use the function properly, but are not explained in a header. Whether or not the function handles 1D, 2D, 3D, etc data isn't written anywhere.
- Providing data as a data.mat file is useful for functions requiring custom made data, but it tends to obscure the format of the data. It would be simpler and easier to understand by providing equally simple sample code that makes the data.
- Providing brief sample code on the file exchange is fine, but it should be included as a header to the file, so that people can find it when they need it, rather than when they first download it.
Otherwise, thanks! Like I said, it solved my immediate problem.

10 Aug 2009 Michael Chen

To Sven:
Sorry for the inconvenience. Here's the answer to your questions.
The function takes two parameters. The first one is a d x n data matrix, of which each column is assumed to be a sample vector of d dimension. The second parameter is the number of clusters. The output is a 1 x n vector, of which each element is the label of the corresponding input sample vector.
The function handles data of arbitrary dimensions.

05 Oct 2009 Onur Kalabak  
05 Oct 2009 Onur Kalabak

Thank you for the share. I have two questions. Do we have to use both of the functions to cluster? I have 13x7000 matrix which I want to cluster. Should I just simply apply the matrice to litekmeans.m? And how can I plot the result as displayed in the picture?

Thanks

08 Oct 2009 Michael Chen

Yes, just call the litekmeans.m to get the clustering results. You cannot get a visualization in a simple way for the data whose dimensions are more than 3. The scatterd.m can only handle data of 2d or 3d.

13 Oct 2009 Fen Xie

Sorry, I have compared the results of your program and the embedded program of matlab, the two results doesn't show the same, so what does it mean??

13 Oct 2009 Fen Xie

this method produces empty clusters constantly, be careful dealing with these exceptions~

16 Oct 2009 Michael Chen

The results of kmeans algorithm can be different with different initializations. Actually the kmeans function in matlab is not a standard kmeans algorithm. It tries to get smaller energy by switching data points in different clusters after the standard kmeans procedure converged.
One purpose of the litekmeans is to be simple (only 10 lines of code), therefore I did not add extra code to handle empty cluster. It just discard the cluster if the cluster becomes empty. You can modify the code yourself if you want extra functionality.

Please login to add a comment or rating.
Updates
01 Jul 2009

update the files and description

Tag Activity for this File
Tag Applied By Date/Time
kmeans Michael Chen 01 Jul 2009 14:22:50
clustering Michael Chen 01 Jul 2009 14:22:50
vector quantization Michael Chen 01 Jul 2009 14:22:50
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com