Fixing the Silhouette Plot (for k-means)?

11 views (last 30 days)
I'm working k-means clustering in MATLAB. My file has three coloumns and I have done the codes for clustering. And I need a function to measure the clustering quality, and I pick silhouette plot. I got the silhoutte code from here (and I want it shows like that): http://stackoverflow.com/questions/6644445/equivalent-of-matlabs-cluster-quality-function
And I fit it with my variables. So here it is the k-means clustering code:
load cobat.txt; % read the file
k=input('Enter a number: '); % determine the number of cluster
isRand=0; % 0 -> sequeantial initialization
% 1 -> random initialization
[maxRow, maxCol]=size(cobat);
if maxRow<=k,
y=[m, 1:maxRow];
elseif k>7
h=msgbox('cant more than 7');
else
% initial value of centroid
if isRand,
p = randperm(size(cobat,1)); % random initialization
for i=1:k
c(i,:)=cobat(p(i),:) ;
end
else
for i=1:k
c(i,:)=cobat(i,:); % sequential initialization
end
end
temp=zeros(maxRow,1); % initialize as zero vector
u=0;
while 1,
d=DistMatrix3(cobat,c); % calculate the distance
[z,g]=min(d,[],2); % set the matrix g group
if g==temp, % if the iteration doesn't change anymore
break; % stop the iteration
else
temp=g; % copy the matrix to the temporary variable
end
for i=1:k
f=find(g==i);
if f % calculate the new centroid
c(i,:)=mean(cobat(find(g==i),:),1)
end
end
end
y=[cobat,g]
%plot silhouette
s = mySilhouette(cobat, g)
[~,ord] = sortrows([g s],[1 -2]);
indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});
ytick = cellfun(@(ind) (min(ind)+max(ind))/2, indices);
ytickLabels = num2str((1:K)','%d'); %#'
h = barh(1:N, s(ord),'hist');
set(h, 'EdgeColor','none', 'CData',IDX(ord))
set(gca, 'CLim',[1 K], 'CLimMode','manual')
set(gca, 'YDir','reverse', 'YTick',ytick, 'YTickLabel',ytickLabels)
xlabel('Silhouette Value'), ylabel('Cluster')
%# compare against SILHOUETTE
figure, silhouette(cobat,g)
Here is the DistMatrix3 function (this is used to calculate the distance)
function d=DistMatrix3(A,B)
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1 & hB==1
d=sqrt(dot((A-B),(A-B)));
else
C=[ones(1,hB);zeros(1,hB);zeros(1,hB)];
D=[zeros(1,hB);ones(1,hB);zeros(1,hB)];
E=flipud(C);
F=[ones(1,hA);zeros(1,hA);zeros(1,hA)];
G=[zeros(1,hA);ones(1,hA);zeros(1,hA)];
H=flipud(F);
I=A*C;
J=A*D;
K=A*E;
L=B*F;
M=B*G;
N=B*H;
d=sqrt((I-L').^2+(J-M').^2+(K-N').^2);
end
And here is the mySilhouette function code:
function s = mySilhouette(cobat, g)
%# X : matrix of size N-by-p, data where rows are instances
%# IDX: vector of size N, cluster index of each instance (starting from 1)
%# s : vector of size N, silhouette score value of each instance
N = size(cobat,1); %# number of instances
K = numel(unique(g)); %# number of clusters
%# compute pairwise distance matrix
D = squareform( pdist(cobat,'euclidean').^2 );
%# indices belonging to each cluster
kIndices = accumarray(g, 1:N, [K 1], @(x){sort(x)});
%# compute a,b,s for each instance
%# a(i): average distance from i to all other data within the same cluster.
%# b(i): lowest average dist from i to the data of another single cluster
a = zeros(N,1);
b = zeros(N,1);
for i=1:N
ind = kIndices{g(i)}; ind = ind(ind~=i);
a(i) = mean( D(i,ind) );
b(i) = min( cellfun(@(ind) mean(D(i,ind)), kIndices([1:K]~=g(i))) );
end
s = (b-a) ./ max(a,b);
end
Here is cobat file:
65 80 55
45 75 78
36 67 66
65 78 88
79 80 72
77 85 65
76 77 79
65 67 88
85 76 88
56 76 65
I run the code, but it's getting error for: "??? Undefined function or variable 'K'. Error in ==> clustere at 54 indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});"
I know that this is because of the K variable. But I don't have any idea what is K for. And I just can't figure it out. Anyone can help me to fix the error and make it works? You help will be much appreciated.
Thank you.
  2 Comments
José-Luis
José-Luis on 6 May 2013
Edited: José-Luis on 6 May 2013
Have you tried using the debugger?
doc dbstop
What's the value of K when the code fails?
Alvi Syahrin
Alvi Syahrin on 7 May 2013
I don't understand why I have to use doc dbstop? See my answer below, I have edited the variables according to my code. But it's still error. Your help will be appreciated, thank you, Jose.

Sign in to comment.

Accepted Answer

Alvi Syahrin
Alvi Syahrin on 8 May 2013
This problem is solved. If you guys have a similiar problem, look at this link: http://stackoverflow.com/questions/16399645/fix-silhouette-plot-for-k-means

More Answers (1)

Alvi Syahrin
Alvi Syahrin on 7 May 2013
Now I have edited the variables according to my code. K becomes k. N becomes maxRow. IDX becomes g. But now I got another error.
"??? Error using ==> accumarray Second input VAL must be a vector with one element for each row in SUBS, or a scalar.
Error in ==> clustere at 56 indices = accumarray(g(ord), 1:k, [k 1], @(x){sort(x)});"
You guys have any idea?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!