Gaussian mixture models are formed by combining
multivariate normal density components In Statistics and Machine Learning Toolbox™ software,
use the `gmdistribution`

class
to fit data using an expectation maximization (EM) algorithm, which assigns
posterior probabilities to each component density with respect to
each observation. The fitting method uses an iterative algorithm that
converges to a local optimum.

Clustering using Gaussian mixture models is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with Gaussian mixture models, see Clustering Using Gaussian Mixture Models. This section describes their creation.

Use the `gmdistribution`

constructor
to create Gaussian mixture models with specified means, covariances,
and mixture proportions.

First, define the means, covariances, and mixture proportions.

MU = [1 2;-3 -5]; % Means SIGMA = cat(3,[2 0;0 .5],[1 0;0 1]); % Covariances p = ones(1,2)/2; % Mixing proportions

Then, create an object of the `gmdistribution`

class defining a two-component
mixture of bivariate Gaussian distributions:

obj = gmdistribution(MU,SIGMA,p);

Display properties of the object with the MATLAB^{®} function `fieldnames`

:

properties = fieldnames(obj)

properties = 'NumVariables' 'DistributionName' 'NumComponents' 'ComponentProportion' 'SharedCovariance' 'NumIterations' 'RegularizationValue' 'NegativeLogLikelihood' 'CovarianceType' 'mu' 'Sigma' 'AIC' 'BIC' 'Converged'

The `gmdistribution`

reference
page describes these properties. To access the value of a property,
use dot indexing. For example, access the dimensions of the object.

dimension = obj.NDimensions

dimension = 2

Access the distribution name.

name = obj.DistName

name = gaussian mixture distribution

Use the methods `pdf`

and `cdf`

to compute values and visualize the
object:

figure ezsurf(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10])

figure ezsurf(@(x,y)cdf(obj,[x y]),[-10 10],[-10 10])

You can also create Gaussian mixture models by fitting a parametric
model with a specified number of components to data. `fitgmdist`

uses the syntax ```
obj
= fitgmdist(X,k)
```

, where `X`

is a data matrix
and `k`

is the specified number of components. Choosing
a suitable number of components `k`

is essential
for creating a useful model of the data—too few components
fails to model the data accurately; too many components leads to an
over-fit model with singular covariance matrices.

The following example illustrates this approach.

First, create some data from a mixture of two bivariate Gaussian
distributions using the `mvnrnd`

function:

```
MU1 = [1 2];
SIGMA1 = [2 0; 0 .5];
MU2 = [-3 -5];
SIGMA2 = [1 0; 0 1];
X = [mvnrnd(MU1,SIGMA1,1000);
mvnrnd(MU2,SIGMA2,1000)];
figure
scatter(X(:,1),X(:,2),10,'.')
```

Next, fit a two-component Gaussian mixture model:

options = statset('Display','final'); obj = fitgmdist(X,2,'Options',options); hold on h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]); hold off

18 iterations, log-likelihood = -7058.35

Among the properties of the fit are the parameter estimates.

Display the estimates for mu, sigma, and mixture proportions

ComponentMeans = obj.mu ComponentCovariances = obj.Sigma MixtureProportions = obj.PComponents

ComponentMeans = -2.9617 -4.9727 0.9539 2.0261 ComponentCovariances(:,:,1) = 1.0100 0.0059 0.0059 0.9897 ComponentCovariances(:,:,2) = 1.9939 -0.0092 -0.0092 0.4981 MixtureProportions = 0.5000 0.5000

The two-component model minimizes the Akaike information:

AIC = zeros(1,4); obj = cell(1,4); for k = 1:4 obj{k} = fitgmdist(X,k); AIC(k)= obj{k}.AIC; end [minAIC,numComponents] = min(AIC); numComponents

numComponents = 2

Display the model.

model = obj{2}

model = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -2.9617 -4.9727 Component 2: Mixing proportion: 0.500000 Mean: 0.9539 2.0261

Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. You can use them to determine an appropriate number of components for a model when the number of components is unspecified.

Use the method `random`

of the `gmdistribution`

class to generate random data
from a Gaussian mixture model created with `gmdistribution`

or `fitgmdist`

.

For example, the following specifies a `gmdistribution`

object
consisting of a two-component mixture of bivariate Gaussian distributions:

MU = [1 2;-3 -5]; SIGMA = cat(3,[2 0;0 .5],[1 0;0 1]); p = ones(1,2)/2; obj = gmdistribution(MU,SIGMA,p);

```
figure
ezcontour(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10])
hold on
```

Use `random(gmdistribution)`

to generate
1000 random values:

```
Y = random(obj,1000);
scatter(Y(:,1),Y(:,2),10,'.')
```

Was this topic helpful?