Gaussian mixture models are often used for data clustering.
Clusters are assigned by selecting the component that maximizes the
posterior probability. Like *k*-means clustering,
Gaussian mixture modeling uses an iterative algorithm that converges
to a local optimum. Gaussian mixture modeling may be more appropriate
than *k*-means clustering when clusters have different
sizes and correlation within them. Clustering using Gaussian mixture
models is sometimes considered a soft clustering method. The posterior
probabilities for each point indicate that each data point has some
probability of belonging to each cluster.

Gaussian mixture distributions can be used for clustering data, by realizing that the multivariate normal components of the fitted model can represent clusters.

To demonstrate the process, first generate some simulated data from a mixture of two bivariate Gaussian distributions using the

`mvnrnd`

function.rng default; % For reproducibility mu1 = [1 2]; sigma1 = [3 .2; .2 2]; mu2 = [-1 -2]; sigma2 = [2 0; 0 1]; X = [mvnrnd(mu1,sigma1,200);mvnrnd(mu2,sigma2,100)]; scatter(X(:,1),X(:,2),10,'ko')

Fit a two-component Gaussian mixture distribution. Here, you know the correct number of components to use. In practice, with real data, this decision would require comparing models with different numbers of components.

options = statset('Display','final'); gm = fitgmdist(X,2,'Options',options);

33 iterations, log-likelihood = -1210.59

Plot the estimated probability density contours for the two-component mixture distribution. The two bivariate normal components overlap, but their peaks are distinct. This suggests that the data could reasonably be divided into two clusters.

hold on ezcontour(@(x,y)pdf(gm,[x y]),[-8 6],[-8 6]); hold off

Partition the data into clusters using the

`cluster`

method for the fitted mixture distribution. The`cluster`

method assigns each point to one of the two components in the mixture distribution.idx = cluster(gm,X); cluster1 = (idx == 1); cluster2 = (idx == 2); scatter(X(cluster1,1),X(cluster1,2),10,'r+'); hold on scatter(X(cluster2,1),X(cluster2,2),10,'bo'); hold off legend('Cluster 1','Cluster 2','Location','NW')

Each cluster corresponds to one of the bivariate normal components in the mixture distribution.

`cluster`

assigns points to clusters based on the estimated posterior probability that a point came from a component; each point is assigned to the cluster corresponding to the highest posterior probability. The posterior method returns those`posterior`

probabilities. For example, plot the posterior probability of the first component for each point.P = posterior(gm,X); scatter(X(cluster1,1),X(cluster1,2),10,P(cluster1,1),'+') hold on scatter(X(cluster2,1),X(cluster2,2),10,P(cluster2,1),'o') hold off legend('Cluster 1','Cluster 2','Location','NW') clrmap = jet(80); colormap(clrmap(9:72,:)) ylabel(colorbar,'Component 1 Posterior Probability')

An alternative to the previous example is to use the posterior probabilities for "soft clustering". Each point is assigned a membership score to each cluster. Membership scores are simply the posterior probabilities, and describe how similar each point is to each cluster's archetype, i.e., the mean of the corresponding component. The points can be ranked by their membership score in a given cluster.

[~,order] = sort(P(:,1)); plot(1:size(X,1),P(order,1),'r-',1:size(X,1),P(order,2),'b-'); legend({'Cluster 1 Score' 'Cluster 2 Score'},'location','NW'); ylabel('Cluster Membership Score'); xlabel('Point Ranking');

Although a clear separation of the data is hard to see in a scatter plot of the data, plotting the membership scores indicates that the fitted distribution does a good job of separating the data into groups. Very few points have scores close to 0.5.

Soft clustering using a Gaussian mixture distribution is similar
to fuzzy *k*-means clustering, which also assigns
each point to each cluster with a membership score. The fuzzy *k*-means
algorithm assumes that clusters are roughly spherical in shape, and
all of roughly equal size. This is comparable to a Gaussian mixture
distribution with a single covariance matrix that is shared across
all components, and is a multiple of the identity matrix. In contrast, `gmdistribution`

allows
you to specify different covariance options. The default is to estimate
a separate, unconstrained covariance matrix for each component. A
more restricted option, closer to *k*-means, would
be to estimate a shared, diagonal covariance matrix.

gm2 = fitgmdist(X,2,'CovType','Diagonal',... 'SharedCov',true);

This covariance option is similar to fuzzy *k*-means
clustering, but provides more flexibility by allowing unequal variances
for different variables.

You can compute the soft cluster membership scores without computing
hard cluster assignments, using `posterior`

,
or as part of hard clustering, as the third output from `cluster`

.

P2 = posterior(gm2,X); % equivalently [idx,~,P2] = cluster(gm2,X) [~,order] = sort(P2(:,1)); plot(1:size(X,1),P2(order,1),'r-',1:size(X,1),P2(order,2),'b-'); legend({'Cluster 1 Score' 'Cluster 2 Score'},'location','NW'); ylabel('Cluster Membership Score'); xlabel('Point Ranking');

In the previous example, fitting the mixture distribution to
data using `fitgmdist`

, and clustering those data
using `cluster`

, are separate steps. However, the
same data are used in both steps. You can also use the `cluster`

method
to assign new data points to the clusters (mixture components) found
in the original data.

Given a data set

`X`

, first fit a Gaussian mixture distribution. The previous code has already done that.gm

gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.629379 Mean: 1.0758 2.0426 Component 2: Mixing proportion: 0.370621 Mean: -0.8292 -1.8482

You can then use

`cluster`

to assign each point in a new data set,`Y`

, to one of the clusters defined for the original data.Y = [mvnrnd(mu1,sigma1,50);mvnrnd(mu2,sigma2,25)]; idx = cluster(gm,Y); cluster1 = (idx == 1); cluster2 = (idx == 2); scatter(Y(cluster1,1),Y(cluster1,2),10,'r+'); hold on scatter(Y(cluster2,1),Y(cluster2,2),10,'bo'); hold off legend('Class 1','Class 2','Location','NW')

As with the previous example, the posterior probabilities for each point can be treated as membership scores rather than determining "hard" cluster assignments.

For `cluster`

to provide meaningful results
with new data, `Y`

should come from the same population
as `X`

, the original data used to create the mixture
distribution. In particular, the estimated mixing probabilities for
the Gaussian mixture distribution fitted to `X`

are
used when computing the posterior probabilities for `Y`

.

Was this topic helpful?