I realized that the linkage function uses 'single' linkage by default. So I tried some other links. It turns out everything but 'ward' generates very dense clusters. 'ward' link, on the other hand, generates more spread out clusters. However, the ophenetic correlation coefficient is low (0.5629) compared to over 0.8 for 'single' link. Now I'm have problem determining the number of clusters.
Interpreting the results of hierarchical clustering
3 views (last 30 days)
Show older comments
I tried to perform VQ of my data set using HCA following the tutorial at http://www.mathworks.com/help/stats/hierarchical-clustering.html
Here is my code segment
%M is the data matrix of M entries of N dimensions, where [M,N] = size(X);
CLUSTER_DESIRED = 128;
Y = pdist(X); %Y is the distance matrix expressed as a vector
Z = linkage(Y); %Z contains the clustering binary tree
c = cophenet(Z,Y); % c measures the clustering correlation coefficient
T = cluster(Z,'maxclust',CLUSTER_DESIRED ); %assign a cluster to each data point
h = hist(T,CLUSTER_DESIRED ); %count the number of points in each cluster
M is about 32,000 and N is about 40 for my data. It turns out that most of the points are assigned to 1 out of 128 clusters (more than 95% of all data). Prior to doing HCA, I tried to develop the codebook using k-means, and the resulting codebook is not good for my subsequent experiment. I guess the HCA result tells me that my data are definitely evenly spread out in the vector space. Am I interpreting this correctly? Is VQ not going to work for such adata set?
Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!