Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

cluster

Class: gmdistribution

Construct clusters from Gaussian mixture distribution

Syntax

idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)

Description

idx = cluster(obj,X) partitions data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by obj. obj is an object created by gmdistribution or fitgmdist. idx is an n-by-1 vector, where idx(I) is the cluster index of observation I. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.

    Note:   The data in X is typically the same as the data used to create the Gaussian mixture distribution defined by obj. Clustering with cluster is treated as a separate step, apart from density estimation. For cluster to provide meaningful clustering with new data, X should come from the same population as the data used to create obj.

cluster treats NaN values as missing data. Rows of X with NaN values are excluded from the partition.

[idx,nlogl] = cluster(obj,X) also returns nlogl, the negative log-likelihood of the data.

[idx,nlogl,P] = cluster(obj,X) also returns the posterior probabilities of each component for each observation in the n-by-k matrix P. P(I,J) is the probability of component J given observation I.

[idx,nlogl,P,logpdf] = cluster(obj,X) also returns the n-by-1 vector logpdf containing the logarithm of the estimated probability density function for each observation. The density estimate for observation I is a sum over all components of the component density at I times the component probability.

[idx,nlogl,P,logpdf,M] = cluster(obj,X) also returns an n-by-k matrix M containing Mahalanobis distances in squared units. M(I,J) is the Mahalanobis distance of observation I from the mean of component J.

Examples

expand all

Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function

MU1 = [2 2];
SIGMA1 = [2 0; 0 1];
MU2 = [-2 -1];
SIGMA2 = [1 0; 0 1];
rng(1); % For reproducibility
X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)];

scatter(X(:,1),X(:,2),10,'.')
hold on

Fit a two-component Gaussian mixture model.

obj = fitgmdist(X,2);
h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

Use the fit to cluster the data.

idx = cluster(obj,X);
cluster1 = X(idx == 1,:);
cluster2 = X(idx == 2,:);

delete(h)
h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.');
h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.');
legend([h1 h2],'Cluster 1','Cluster 2','Location','NW')

Was this topic helpful?