Documentation Center

  • Trial Software
  • Product Updates

cluster

Class: gmdistribution

Construct clusters from Gaussian mixture distribution

Syntax

idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)

Description

idx = cluster(obj,X) partitions data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by obj. obj is an object created by gmdistribution or fitgmdist. idx is an n-by-1 vector, where idx(I) is the cluster index of observation I. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.

    Note:   The data in X is typically the same as the data used to create the Gaussian mixture distribution defined by obj. Clustering with cluster is treated as a separate step, apart from density estimation. For cluster to provide meaningful clustering with new data, X should come from the same population as the data used to create obj.

cluster treats NaN values as missing data. Rows of X with NaN values are excluded from the partition.

[idx,nlogl] = cluster(obj,X) also returns nlogl, the negative log-likelihood of the data.

[idx,nlogl,P] = cluster(obj,X) also returns the posterior probabilities of each component for each observation in the n-by-k matrix P. P(I,J) is the probability of component J given observation I.

[idx,nlogl,P,logpdf] = cluster(obj,X) also returns the n-by-1 vector logpdf containing the logarithm of the estimated probability density function for each observation. The density estimate for observation I is a sum over all components of the component density at I times the component probability.

[idx,nlogl,P,logpdf,M] = cluster(obj,X) also returns an n-by-k matrix M containing Mahalanobis distances in squared units. M(I,J) is the Mahalanobis distance of observation I from the mean of component J.

Examples

expand all

Cluster Data from a Gaussian Mixture Distribution

Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function

MU1 = [2 2];
SIGMA1 = [2 0; 0 1];
MU2 = [-2 -1];
SIGMA2 = [1 0; 0 1];
rng(1); % For reproducibility
X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)];

scatter(X(:,1),X(:,2),10,'.')
hold on

Fit a two-component Gaussian mixture model.

obj = fitgmdist(X,2);
h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

Use the fit to cluster the data.

idx = cluster(obj,X);
cluster1 = X(idx == 1,:);
cluster2 = X(idx == 2,:);

delete(h)
h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.');
h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.');
legend([h1 h2],'Cluster 1','Cluster 2','Location','NW')

See Also

| | |

Was this topic helpful?