To give a simple example:
I have 4 data points p1, p2, p3, p4 (in blue dots). I performed k-means twice with k = 2 and plotted the output centroids for the two clusters C1 and C2 (green dots).
The two iteration of kmeans are shown below (left and right). Noticed that in the second iteration (right), C2 and p2 are in the same location.
To compare the performance of k-means in this two iterations, or to find out which of these two cases is a better clustering, do I just look at 'sumd' which is the sum of the distance of each point to the centroid in that cluster?
In this case, sumd of left is [0.5000, 0.5000] while sumd of right is [1.3333, 0].
In order to compare the two cases,
Do I just sum the 'sumd' of left and get '1', and sum the 'sumd' of right and get '1.3333', and take the smaller number which is '1' and claim left is better clustered?
Am I doing it correctly?