Thanks for this code, but for some datasets it's hypersensitive to rounding errors: occasionally the slightly nonzero entries along D's diagonal lead to results that are surprisingly far from correct (eg the chance that given medoids are chosen increases or decreases). I can email code to demonstrate if needed.
The problem is fixed by adding a for-loop after D is defined:
D(i,i) = 0;
To confirm that this yields 'correct' results, I perturbed one of the problematic datasets by a tiny amount and compared medoids before-and-after.
Even though the code is lightning fast, the solution is not the proper one, hence this code is useless. See http://i.imgur.com/VNY73l7.png for example output of k = 6 for a naive (and very slow) implementation of the algorithm, and this submission. Obviously the naive is correct, this submission is incorrect.
Nonetheless thanks for the effort! It would be great if you could produce correct code that is still as fast.
Very compact and efficient coding. Nice job!
However, there is an error when k=1. I suggest to add the third parameter (dimension) to the calls of function min:
(Line 8): [~, label] = min(D(randsample(n,k),:),,1);
(Line 11): [~, label] = [~, index] = min(D*sparse(1:n,label,1,n,k,n),,1);