Products & Services Solutions Academia Support User Community Company

Learn more about Statistics Toolbox   

cluster - Class: gmdistribution

Construct clusters from Gaussian mixture distribution

Syntax

idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)

Description

idx = cluster(obj,X) partitions data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by obj. obj is an object created by gmdistribution or fit. idx is an n-by-1 vector, where idx(I) is the cluster index of observation I. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.

cluster treats NaN values as missing data. Rows of X with NaN values are excluded from the partition.

[idx,nlogl] = cluster(obj,X) also returns nlogl, the negative log-likelihood of the data.

[idx,nlogl,P] = cluster(obj,X) also returns the posterior probabilities of each component for each observation in the n-by-k matrix P. P(I,J) is the probability of component J given observation I.

[idx,nlogl,P,logpdf] = cluster(obj,X) also returns the n-by-1 vector logpdf containing the logarithm of the estimated probability density function for each observation. The density estimate for observation I is a sum over all components of the component density at I times the component probability.

[idx,nlogl,P,logpdf,M] = cluster(obj,X) also returns an n-by-k matrix M containing Mahalanobis distances in squared units. M(I,J) is the Mahalanobis distance of observation I from the mean of component J.

Examples

Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:

MU1 = [1 2];
SIGMA1 = [2 0; 0 .5];
MU2 = [-3 -5];
SIGMA2 = [1 0; 0 1];
X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)];

scatter(X(:,1),X(:,2),10,'.')
hold on

Fit a two-component Gaussian mixture model:

obj = gmdistribution.fit(X,2);
h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

Use the fit to cluster the data:

idx = cluster(obj,X);
cluster1 = X(idx == 1,:);
cluster2 = X(idx == 2,:);

delete(h)
h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.');
h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.');
legend([h1 h2],'Cluster 1','Cluster 2','Location','NW')

See Also

gmdistribution, fit, posterior, mahal

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS