| Date | File | Comment by | Comment | Rating |
|---|---|---|---|---|
| 16 Oct 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Chen, Michael | The results of kmeans algorithm can be different with different initializations. Actually the kmeans function in matlab is not a standard kmeans algorithm. It tries to get smaller energy by switching data points in different clusters after the standard kmeans procedure converged.
|
|
| 13 Oct 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Xie, Fen | this method produces empty clusters constantly, be careful dealing with these exceptions~ |
|
| 13 Oct 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Xie, Fen | Sorry, I have compared the results of your program and the embedded program of matlab, the two results doesn't show the same, so what does it mean?? |
|
| 08 Oct 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Chen, Michael | Yes, just call the litekmeans.m to get the clustering results. You cannot get a visualization in a simple way for the data whose dimensions are more than 3. The scatterd.m can only handle data of 2d or 3d. |
|
| 05 Oct 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Kalabak, Onur | Thank you for the share. I have two questions. Do we have to use both of the functions to cluster? I have 13x7000 matrix which I want to cluster. Should I just simply apply the matrice to litekmeans.m? And how can I plot the result as displayed in the picture? Thanks |
|
| 05 Oct 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Kalabak, Onur | ||
| 10 Aug 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Chen, Michael | To Sven:
|
|
| 06 Aug 2009 | kmeans clustering Fully vectorized kmeans algorithm. Fast yet simple (10 lines) | Sven | This gave a simple implementation to the problem I had.
|
|
| 03 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | Chen, Michael | One more word for input verification, you can not check every aspects of the inputs. For example, checking whether the input matirx is positive definite in this code is just crazy which will cost more time than the function itself. One must end up at some point between checking everything and checking nothing, which is a design desicion the coder should make.
|
|
| 03 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | Chen, Michael | By the way, reading you review reminds me some review comments of some of my papers. Some reviewers just like to focus on whether the formate is right, whether the citation is right even whether the spell is right but not the idea of the paper itself. That is realy a pity. |
|
| 03 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | Chen, Michael | If you have read the code STL of C++, you will find there is little if statements, and almost no runtime check for input. That is because it reduces the efficiency.
|
|
| 03 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | D'Errico, John | Improving. It now runs as claimed for the 1-d case, so the bug is removed. I also like that this is called sqdistance. The problem with calling it distance is that it was not a distance. Better is to make that clear, that this returns the square of the Euclidean distance. So this is also good. There is now even an attempt at error checking, in that the author uses the assert function to flag problems. However, assert allows you to provide TWO arguments. See what happens when I call sqdistance with improperly sized arrays: sqdistance(rand(3,4),rand(2))
All it tells me is "Assertion failed". For gods sake, what assertion? Use the second argument! Allow the code to exit gracefully and descriptively. Tell the user that the arguments are incompatible in size for this operation. To just tell them "Assertion failed" is silly. Why bother to do so? Obviously from the authors last comment, this is just a code for his own academic purposes. So apparently it needs not be any good, or do what he claims it does. I'll argue that he is wrong. When you post something on a website like this, hundreds or even many thousands of MATLAB users may use your code, or try to do so. They may look at it, hoping to learn something from what you post. So I'm sorry, but it would be a disservice on my part to any MATLAB user or student for me NOT to tell them that I see a problem, and what is wrong, and if possible, how the problem should best have been resolved (in my opinion.) And if this author has improved his code or coding style because of what I show to be wrong, then I've helped him too. |
|
| 02 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | Chen, Michael | If you want the Euclidean distance itself, nobody prevents you from taking a simple sqrt on top of this function, it wont cost you a second. On the other hand, there are a lot of situations that the square distance is required (or sufficient) not the distance, such as KNN, Kmeans, Spherical Gaussian density, etc. This is just code for academic purpose, if you feel helpful, just use it where it is suitable. I'm not making some industry product, so give me a break. |
|
| 02 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | D'Errico, John | Somewhat better now in ONE respect. However, just because you have never suffered from the extreme case I showed does not mean that you have never had the problem. You've just never noticed it, or even known to look. There is still a problem. Perhaps my long comments before were not extensive enough. This is NOT actually a valid distance metric. NOT. NOT. Read the definition of a distance metric. Here are a few sites: http://www.mathreference.com/top-ms,dm.html
See that one requirement for a valid distance metric is the triangle inequality. The triangle inequality states that D(x,y) <= D(x,z) + D(y,z) Does distance satisfy that? TRY IT! distance(1,5)
distance(1,3) + distance(5,3)
(distance(1,3) + distance(5,3)) > distance(1,5)
So this function fails to satisfy one of the basic requirements for a distance metric. This does NOT compute a distance. Just because you call the function distance does not make it a distance. The failure to compute the sqrt makes this not a valid distance. Next, this function fails to work properly in one dimension! distance([1,2])
I would have expected to see the matrix [0 1;1 0] returned as the result. (Surely you will concede this fact.) Don't complain that it was my suggested change that caused it to fail, because the original code fails too here, and by a larger margin. originaldistance([1,2])
Finally, the changes to the help made it more accurate, but still fail to describe the behavior of this code. The help now states that when when called with two arguments, it returns the square of the interpoint Euclidean distances between columns of the matrices. It does not state at all what happens when called with only one argument. Looking slightly deeper at the code pointed out serious flaws still. My rating is verging closer to one star at this point. |
|
| 02 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | Chen, Michael | The speed gain is not that this code does not compute sqrt but that it has no for loops, which is the main purpose of this function: demostrating how to vectorized the code in such scenarios.
|
|
| 02 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | D'Errico, John | The problem with this code is it is potentially inaccurate. It uses an identity that people are oh so impressed with, but squares the numbers unnecessarily. I'll talk about this problem eventually, but first, let me discuss another major flaw with this code. It does not actually compute the Euclidean distance. It computes the square of that distance. As such, it has the wrong units. Don't forget that if your data has units miles, then all "distances" produced by this function will have units of miles squared. Note that this code claims to produce the same numbers as does the pdist function. But it does not. pdist produces TRUE distances, not the square of the distance. There is a difference. A = randn(3,5);
distance(A,B)
The true interpoint distances, as one will normally see by any correct tool available to compute distance is closer to this: sqrt(distance(A,B))
I imagine the author will claim that by not computing the square root, it saves time. I suppose so, but to then claim that the result is a distance as most expect to see would be misleading. And surely to claim that it produces the same as other codes is very misleading. A nice property for a distance measure to have is linearity. We would like to see that distance(k*A,k*B) = k*distance(A,B) It is something that pdist will give you. But this is not a property one will gain from distance. On to the other problem, the one that I see as most damaging. Recall the results of the above experiment for distance(A,B). Now, try adding any fixed offset to both A and B. I'll add a large one to the matrices to show that complete trash is generated. distance(A+1e8,B+1e8)
See that NO significant digits remain in the result. A higher quality tool (like pdist for example) will survive even this extreme test quite nicely. As it turns out, this is not an extreme test for pdist. (Since pdist does only interpoint distances for a single matrix, these numbers will not be comparable. See only that adding 1e8 did not change any digits printed out.) pdist(A'+1e8)
pdist(A')
I will point out that all of the above mentioned flaws could have been repaired with a few simple changes to the code. Had the code been written as follows, it would take only about 20% longer to execute in my quick test, but it would have worked properly and robustly. if nargin == 1
Finally, the help for this code is itself wrong. Here is what it says when you do "help distance". Compute distances between all sample pairs
See that it tells you that X is a d by n matrix of data. Does it tell you that the function actually accepts TWO arguments? That if there are two arguments, then it computes distances between all pairs of columns of the two matrices? No error checks to verify the data actually conforms for distance computation. I'll be generous and give this a 2 star rating. |
|
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.
Contact us at files@mathworks.com