Search Comments and Ratings

go

Comments and Ratings

   
Date File Comment by Comment Rating
05 Nov 2009 Sampling from a discrete distribution The function is to draw samples from an arbitrary discrete distribution. Author: Dahua Lin Chen, Michael

your function seems to complicate the problem a little bit.
The following line is enough to do the jod
[~,x] = histc(rand(1,n),[0;cumsum(p(:))/sum(p)]);

16 Oct 2009 kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) Author: Michael Chen Chen, Michael

The results of kmeans algorithm can be different with different initializations. Actually the kmeans function in matlab is not a standard kmeans algorithm. It tries to get smaller energy by switching data points in different clusters after the standard kmeans procedure converged.
One purpose of the litekmeans is to be simple (only 10 lines of code), therefore I did not add extra code to handle empty cluster. It just discard the cluster if the cluster becomes empty. You can modify the code yourself if you want extra functionality.

08 Oct 2009 kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) Author: Michael Chen Chen, Michael

Yes, just call the litekmeans.m to get the clustering results. You cannot get a visualization in a simple way for the data whose dimensions are more than 3. The scatterd.m can only handle data of 2d or 3d.

10 Aug 2009 kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) Author: Michael Chen Chen, Michael

To Sven:
Sorry for the inconvenience. Here's the answer to your questions.
The function takes two parameters. The first one is a d x n data matrix, of which each column is assumed to be a sample vector of d dimension. The second parameter is the number of clusters. The output is a 1 x n vector, of which each element is the label of the corresponding input sample vector.
The function handles data of arbitrary dimensions.

03 Jul 2009 Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. Author: Michael Chen Chen, Michael

One more word for input verification, you can not check every aspects of the inputs. For example, checking whether the input matirx is positive definite in this code is just crazy which will cost more time than the function itself. One must end up at some point between checking everything and checking nothing, which is a design desicion the coder should make.
In such a simple code, i dont want nasty guarding code, which be even longer than the main funcional code, distracting the reader's attention.

03 Jul 2009 Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. Author: Michael Chen Chen, Michael

By the way, reading you review reminds me some review comments of some of my papers. Some reviewers just like to focus on whether the formate is right, whether the citation is right even whether the spell is right but not the idea of the paper itself. That is realy a pity.

03 Jul 2009 Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. Author: Michael Chen Chen, Michael

If you have read the code STL of C++, you will find there is little if statements, and almost no runtime check for input. That is because it reduces the efficiency.
It is more severe in matlab, when you call a function a lot of times, such as kmeans (try to add some 'if' in the inner loop you will see). There are different programming styles: 1) make a lot redundant check in the beginning of every function which sacrify efficiency for robustness; 2) just make sure it works right when input is right. Assuring input correctness is the caller's resonsibility.
These different styles are just trading off between robustness and efficiency. You prefer the robust one, that's fine, just do it your way. I like my way better.
I'm trying provide some functionality and idea here. When I read other's code, I always feel those redundant inputs verification are misleading, which makes it hard to see what the function is acualling doing. If some one (like you) wants to get some safty, I'm sure it is a easy jod for him to add the 'if's himself, the code is there after all. But I like my code to be clean.

02 Jul 2009 Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. Author: Michael Chen Chen, Michael

If you want the Euclidean distance itself, nobody prevents you from taking a simple sqrt on top of this function, it wont cost you a second. On the other hand, there are a lot of situations that the square distance is required (or sufficient) not the distance, such as KNN, Kmeans, Spherical Gaussian density, etc.

This is just code for academic purpose, if you feel helpful, just use it where it is suitable. I'm not making some industry product, so give me a break.

02 Jul 2009 Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. Author: Michael Chen Chen, Michael

The speed gain is not that this code does not compute sqrt but that it has no for loops, which is the main purpose of this function: demostrating how to vectorized the code in such scenarios.
The suggestion of robustness is valuable, lthough I've never suffer the extreme case in my practice,
The code is updated to include the centerization step and fix the outdated comments.

16 Mar 2009 Efficient K-Means Clustering using JIT A simple but fast tool for K-means clustering Author: Yi Cao Chen, Michael

 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com