# K-means clustering - results and plotting a continuous curve

14 views (last 30 days)
Rayne on 22 Sep 2015
Commented: Rayne on 25 Sep 2015
I am very new to Matlab, and I'm trying to classify some data using K-means. This is what I have:
numClusters = 4;
idx_1 = kmeans([X_1 smoothY_1],numClusters,'Replicates', 5);
[numDataPoints,numDimensions] = size(smoothY_1);
Colors = hsv(numClusters);
for i = 1 : numDataPoints
plot(X_1(i),smoothY_1(i),'.','Color',Colors(idx_1(i),:))
hold on
end
The output I got was I realized that it seems as if what the K-means clustering did was simply divide the graph into numClusters segments and that's it. I've tried with different values of numClusters and each gave me equally divided segments. Surely this can't be right?
Another question I have is about plotting the results. Both X_1 and smoothY_1 are "1825x1 double" arrays. I'm trying to plot a continuous curve, but I only have output if I use '.' in the LineSpec. Using '-' will not give me any output. How do I plot a continuous curve?
Thank you.
ETA: I have plotted the graph in line mode thanks to @Hamoon.
There are actually 3 data sets that I'm trying to cluster using K-means. They were all generated from the same system and consists of 4 distinct operational states. It doesn't seem right to me that the 4 states are all equally divided segments. I thought it is more likely that the long segment after the biggest spike belongs to 1 cluster, rather than 3 different clusters.
Is there any clustering algorithm I should use? Or do I need to do some pre-processing before I use K-means, like perform K-means based on the difference between adjacent points, rather than on the X,Y points themselves?
Thank you.

Hamoon on 22 Sep 2015
1. K-means is a clustering method, it's NOT a classification algorithm, but the way you can then use its output for association. What kind of output do you expect? If you are not happy with this output you probably don't want a clustering method.
2. you are plotting the points one by one, so '-' doesn't give you what you want, you can use this:
for i = 1 : numClusters
idxThis = idx_1==i;
plot(X_1(idxThis),smoothY_1(idxThis),'-','Color',Colors(i,:)) % It also works without '-'
hold on
end
axis([0 1800 0 15])
Rayne on 25 Sep 2015
Thank you very much for your help. I eventually gave SVM a try and it seems to give me results that I'm rather satisfied with.

Kirby Fears on 22 Sep 2015
Edited: Kirby Fears on 22 Sep 2015
kmeans is working exactly as expected for the input you're providing. The best 4 centroids are along your line. Perhaps you can review the wiki page to see why.
Your code calls the plot() function for each point separately. I made a few changes so you can call the plot function only once per cluster, and it plots in line mode as requested:
X1=(1:1825)';
Y1=randn(1825,1);
numClusters=4;
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
for j=1:numClusters,
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
if j==1,
hold on;
end;
end,
hold off;
Rayne on 23 Sep 2015
Yes, X_1 is a time variable, and I'm trying to cluster the (X,Y) points. In fact, I had tried just using K-means on the Y points, and saw the clustering like you said, which isn't what I wanted.
I have replied to @Hamoon on what I think the clustering results should be.