# cluster

Class: gmdistribution

Construct clusters from Gaussian mixture distribution

## Syntax

`idx = cluster(obj,X)[idx,nlogl] = cluster(obj,X)[idx,nlogl,P] = cluster(obj,X)[idx,nlogl,P,logpdf] = cluster(obj,X)[idx,nlogl,P,logpdf,M] = cluster(obj,X)`

## Description

`idx = cluster(obj,X)` partitions data in the n-by-d matrix `X`, where n is the number of observations and d is the dimension of the data, into k clusters determined by the k components of the Gaussian mixture distribution defined by `obj`. `obj` is an object created by `gmdistribution` or `fitgmdist`. `idx` is an n-by-1 vector, where `idx(I)` is the cluster index of observation `I`. The cluster index gives the component with the largest posterior probability for the observation, weighted by the component probability.

 Note:   The data in `X` is typically the same as the data used to create the Gaussian mixture distribution defined by `obj`. Clustering with `cluster` is treated as a separate step, apart from density estimation. For `cluster` to provide meaningful clustering with new data, `X` should come from the same population as the data used to create `obj`.

`cluster` treats `NaN` values as missing data. Rows of `X` with `NaN` values are excluded from the partition.

`[idx,nlogl] = cluster(obj,X)` also returns `nlogl`, the negative log-likelihood of the data.

`[idx,nlogl,P] = cluster(obj,X)` also returns the posterior probabilities of each component for each observation in the n-by-k matrix `P`. `P(I,J)` is the probability of component `J` given observation `I`.

`[idx,nlogl,P,logpdf] = cluster(obj,X)` also returns the n-by-1 vector `logpdf` containing the logarithm of the estimated probability density function for each observation. The density estimate for observation `I` is a sum over all components of the component density at `I` times the component probability.

`[idx,nlogl,P,logpdf,M] = cluster(obj,X)` also returns an n-by-k matrix `M` containing Mahalanobis distances in squared units. `M(I,J)` is the Mahalanobis distance of observation `I` from the mean of component `J`.

## Examples

expand all

### Cluster Data from a Gaussian Mixture Distribution

Generate data from a mixture of two bivariate Gaussian distributions using the `mvnrnd` function

```MU1 = [2 2]; SIGMA1 = [2 0; 0 1]; MU2 = [-2 -1]; SIGMA2 = [1 0; 0 1]; rng(1); % For reproducibility X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on ```

Fit a two-component Gaussian mixture model.

```obj = fitgmdist(X,2); h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]); ```

Use the fit to cluster the data.

```idx = cluster(obj,X); cluster1 = X(idx == 1,:); cluster2 = X(idx == 2,:); delete(h) h1 = scatter(cluster1(:,1),cluster1(:,2),10,'r.'); h2 = scatter(cluster2(:,1),cluster2(:,2),10,'g.'); legend([h1 h2],'Cluster 1','Cluster 2','Location','NW') ```