Construct clusters from Gaussian mixture distribution
idx = cluster(obj,X)
[idx,nlogl] = cluster(obj,X)
[idx,nlogl,P] = cluster(obj,X)
[idx,nlogl,P,logpdf] = cluster(obj,X)
[idx,nlogl,P,logpdf,M] = cluster(obj,X)
idx = cluster(obj,X) partitions data in
the n-by-d matrix
where n is the number of observations and d is
the dimension of the data, into k clusters determined
by the k components of the Gaussian mixture distribution
obj is an object
idx is an n-by-1
idx(I) is the cluster index of observation
The cluster index gives the component with the largest posterior probability
for the observation, weighted by the component probability.
The data in
X is typically the same as the
data used to create the Gaussian mixture distribution defined by
cluster is treated as a separate
step, apart from density estimation. For
provide meaningful clustering with new data,
come from the same population as the data used to create
as missing data. Rows of
are excluded from the partition.
[idx,nlogl] = cluster(obj,X) also returns
the negative log-likelihood of the data.
[idx,nlogl,P] = cluster(obj,X) also returns
the posterior probabilities of each component for each observation
in the n-by-k matrix
the probability of component
J given observation
[idx,nlogl,P,logpdf] = cluster(obj,X) also
returns the n-by-1 vector
the logarithm of the estimated probability density function for each
observation. The density estimate for observation
a sum over all components of the component density at
the component probability.
[idx,nlogl,P,logpdf,M] = cluster(obj,X) also
returns an n-by-k matrix
Mahalanobis distances in squared units.
the Mahalanobis distance of observation
the mean of component
Generate data from a mixture of two bivariate Gaussian distributions using the
MU1 = [2 2]; SIGMA1 = [2 0; 0 1]; MU2 = [-2 -1]; SIGMA2 = [1 0; 0 1]; rng(1); % For reproducibility X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on
Fit a two-component Gaussian mixture model.
obj = fitgmdist(X,2); h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);
Use the fit to cluster the data.
idx = cluster(obj,X); cluster1 = X(idx == 1,:); cluster2 = X(idx == 2,:); delete(h) h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.'); h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.'); legend([h1 h2],'Cluster 1','Cluster 2','Location','NW')