Interpreting the results of hierarchical clustering

3 views (last 30 days)
sc
sc on 10 Jun 2015
Commented: sc on 10 Jun 2015
I tried to perform VQ of my data set using HCA following the tutorial at http://www.mathworks.com/help/stats/hierarchical-clustering.html
Here is my code segment
%M is the data matrix of M entries of N dimensions, where [M,N] = size(X);
CLUSTER_DESIRED = 128;
Y = pdist(X); %Y is the distance matrix expressed as a vector
Z = linkage(Y); %Z contains the clustering binary tree
c = cophenet(Z,Y); % c measures the clustering correlation coefficient
T = cluster(Z,'maxclust',CLUSTER_DESIRED ); %assign a cluster to each data point
h = hist(T,CLUSTER_DESIRED ); %count the number of points in each cluster
M is about 32,000 and N is about 40 for my data. It turns out that most of the points are assigned to 1 out of 128 clusters (more than 95% of all data). Prior to doing HCA, I tried to develop the codebook using k-means, and the resulting codebook is not good for my subsequent experiment. I guess the HCA result tells me that my data are definitely evenly spread out in the vector space. Am I interpreting this correctly? Is VQ not going to work for such adata set?
  1 Comment
sc
sc on 10 Jun 2015
I realized that the linkage function uses 'single' linkage by default. So I tried some other links. It turns out everything but 'ward' generates very dense clusters. 'ward' link, on the other hand, generates more spread out clusters. However, the ophenetic correlation coefficient is low (0.5629) compared to over 0.8 for 'single' link. Now I'm have problem determining the number of clusters.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!