How to apply Silhouette Score for optimal K in MATLAB

Hello,I Hope you are doing well. I have the following dataset i want to apply Silhouette Score from scrath.
I have the Python Script for that. Can anybody help me in implmenting it into MATLAB
The following code is in Python
costs = []
for p in range(10):
kmeans = K_Means(k=p,data = data[0],centeriod_init='random')
centroids, cluster_assignments, iters, orig_centroids = kmeans.fit(data[0])
X = data[0]
dist_ji = 0
a = 0
s=0
for i in range(len(data[0])):
for j in range(p):
dist_ji += euclidean_dist(centroids[j,:],X[i,:])
#print(dist_ji)
dist_ji -= sum(cluster_assignments[:,1])/len(data[0])
a = sum(cluster_assignments[:,1])/(len(data[0])-1)
s = (dist_ji - a)/max(dist_ji,a)
s = np.array(s)
s = s.item()
costs.append(s)
x = np.arange(10)
plt.plot(x,costs)
plt.title("Silhoutte Score")
plt.xlabel("K -->")
plt.ylabel("Dispersion")

Answers (1)

I do not know enough python to know how to convert this code.
I suspect: call kmeans() with p as the number of centroids, getting back indices and centroid locations. Then take
nearest_center = CentroidLocations(CentroidIdx,:);
a = mean((data - nearest_center).^2,2);
or something like that.

4 Comments

I have write the following code in MATLAB but some command i am unable to understand how they used in MATLAB
like
dist_ji += euclidean_dist(centroids[j,:],X[i,:])
#print(dist_ji)
dist_ji -= sum(cluster_assignments[:,1])/len(data[0])
a = sum(cluster_assignments[:,1])/(len(data[0])-1)
s = (dist_ji - a)/max(dist_ji,a)
costs=[]
for p=1:10
[idx,C,sumd]=kmeans(dataset1,p);
X=dataset1;
dist_ji=0;
a=0
s=0
for i=1:length(dataset1)
for j=1:10
end
end
end
euclidean_dist(centroids[j,:],X[i,:])
in MATLAB could be written as
pdist2(centroids(j,:), X(i,:))
but that requires the statistics toolbox, and is more overhead than writing
sum((centrodis(j,:) - X(i,:)).^2)
If you make a small modification of the code I posted earlier
nearest_center = CentroidLocations(CentroidIdx,:);
distance = sum((data - nearest_center).^2,2);
then distance would be a column vector of respective euclidean distance between each point and its associated centroid .The python code then goes on to calculate the sum of those and divide the total by the number of entries, which in MATLAB could written as mean(distance). What I posted earlier is probably slightly wrong for the situation; I probably should have posted
a = mean(sum((data - nearest_center).^2,2));
and that line together with the assignment to nearest_center would replace the block of python code
X = data[0]
dist_ji = 0
a = 0
s=0
for i in range(len(data[0])):
for j in range(p):
dist_ji += euclidean_dist(centroids[j,:],X[i,:])
#print(dist_ji)
dist_ji -= sum(cluster_assignments[:,1])/len(data[0])
a = sum(cluster_assignments[:,1])/(len(data[0])-1)
@Walter Roberson CentroidLocations is not any command in MATLAB
N = 10;
s = zeros(size(data,1), N);
for p = 1 : N
[CentroidIdx, CentroidLocations] = kmeans(data, p); %random initialization is default
nearest_center = CentroidLocations(CentroidIdx,:);
dist_ji = sum((data - nearest_center).^2,2);
a = mean(dist_ji);
s(:,p) = (dist_ji - a)./max(dist_ji,a);
end
plot(s)

Sign in to comment.

Products

Release

R2022a

Asked:

on 10 Jan 2023

Commented:

on 12 Jan 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!