| Products & Services | Solutions | Academia | Support | User Community | Company |
| Download Product Updates | | | Get Pricing | | | Trial Software |
| Documentation → Statistics Toolbox |
| Contents | Index |
| Learn more about Statistics Toolbox |
Construct clusters from Gaussian mixture distribution
idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)
idx = cluster(obj,X) partitions data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by obj. obj is an object created by gmdistribution or fit. idx is an n-by-1 vector, where idx(I) is the cluster index of observation I. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.
Note The data in X is typically the same as the data used to create the Gaussian mixture distribution defined by obj. Clustering with cluster is treated as a separate step, apart from density estimation. For cluster to provide meaningful clustering with new data, X should come from the same population as the data used to create obj. |
cluster treats NaN values as missing data. Rows of X with NaN values are excluded from the partition.
[idx,nlogl] = cluster(obj,X) also returns nlogl, the negative log-likelihood of the data.
[idx,nlogl,P] = cluster(obj,X) also returns the posterior probabilities of each component for each observation in the n-by-k matrix P. P(I,J) is the probability of component J given observation I.
[idx,nlogl,P,logpdf] = cluster(obj,X) also returns the n-by-1 vector logpdf containing the logarithm of the estimated probability density function for each observation. The density estimate for observation I is a sum over all components of the component density at I times the component probability.
[idx,nlogl,P,logpdf,M] = cluster(obj,X) also returns an n-by-k matrix M containing Mahalanobis distances in squared units. M(I,J) is the Mahalanobis distance of observation I from the mean of component J.
Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on

Fit a two-component Gaussian mixture model:
obj = gmdistribution.fit(X,2); h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

Use the fit to cluster the data:
idx = cluster(obj,X); cluster1 = X(idx == 1,:); cluster2 = X(idx == 2,:); delete(h) h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.'); h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.'); legend([h1 h2],'Cluster 1','Cluster 2','Location','NW')

gmdistribution, fit, posterior, mahal
![]() | cluster | clusterdata | ![]() |

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.
| © 1984-2009- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |