idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)
idx = cluster(obj,X) partitions data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by obj. obj is an object created by gmdistribution or fit. idx is an n-by-1 vector, where idx(I) is the cluster index of observation I. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.
Note: The data in X is typically the same as the data used to create the Gaussian mixture distribution defined by obj. Clustering with cluster is treated as a separate step, apart from density estimation. For cluster to provide meaningful clustering with new data, X should come from the same population as the data used to create obj.
cluster treats NaN values as missing data. Rows of X with NaN values are excluded from the partition.
[idx,nlogl] = cluster(obj,X) also returns nlogl, the negative log-likelihood of the data.
[idx,nlogl,P] = cluster(obj,X) also returns the posterior probabilities of each component for each observation in the n-by-k matrix P. P(I,J) is the probability of component J given observation I.
[idx,nlogl,P,logpdf] = cluster(obj,X) also returns the n-by-1 vector logpdf containing the logarithm of the estimated probability density function for each observation. The density estimate for observation I is a sum over all components of the component density at I times the component probability.
[idx,nlogl,P,logpdf,M] = cluster(obj,X) also returns an n-by-k matrix M containing Mahalanobis distances in squared units. M(I,J) is the Mahalanobis distance of observation I from the mean of component J.
Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on
Fit a two-component Gaussian mixture model:
obj = gmdistribution.fit(X,2); h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);
Use the fit to cluster the data:
idx = cluster(obj,X); cluster1 = X(idx == 1,:); cluster2 = X(idx == 2,:); delete(h) h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.'); h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.'); legend([h1 h2],'Cluster 1','Cluster 2','Location','NW')