MATLAB Examples

# Fit a Gaussian Mixture Model to Data

This example shows how to simulate data from a multivariate normal distribution, and then fit a Gaussian mixture model (GMM) to the data using fitgmdist. To create a known, or fully specified, GMM object, see docid:stats_ug.bus2n9t.

fitgmdist requires a matrix of data (X) and the number of components in the GMM (k). To create a useful GMM, you must choose k carefully. Too few components fails to model the data accurately (i.e., underfitting to the data). Too many components leads to an over-fit model with singular covariance matrices.

Simulate data from a mixture of two bivariate Gaussian distributions using mvnrnd.

```MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; rng(1); % For reproducibility X = [mvnrnd(MU1,SIGMA1,1000); mvnrnd(MU2,SIGMA2,1000)]; figure; scatter(X(:,1),X(:,2),10,'.') ```

Fit a two-component GMM. Plot the pdf of the fitted GMM.

```options = statset('Display','final'); gm = fitgmdist(X,2,'Options',options); gmPDF = @(x,y)pdf(gm,[x y]); hold on h = ezcontour(gmPDF,[-8 6],[-8 6]); title('Scatter Plot and PDF Contour') hold off ```
```5 iterations, log-likelihood = -7105.71 ```

Display the estimates for mu, sigma, and mixture proportions

```ComponentMeans = gm.mu ComponentCovariances = gm.Sigma MixtureProportions = gm.PComponents ```
```ComponentMeans = -3.0377 -4.9859 0.9812 2.0563 ComponentCovariances(:,:,1) = 1.0132 0.0482 0.0482 0.9796 ComponentCovariances(:,:,2) = 1.9919 0.0127 0.0127 0.5533 MixtureProportions = 0.5000 0.5000 ```

Fit four models to the data, each with an increasing number of components.

```AIC = zeros(1,4); gm = cell(1,4); for k = 1:4 gm{k} = fitgmdist(X,k); AIC(k)= gm{k}.AIC; end ```

Display the number of components that minimizes the AIC.

```[minAIC,numComponents] = min(AIC); numComponents ```
```numComponents = 2 ```

The two-component model minimizes the AIC.

Display the two-component GMM.

```gm2 = gm{numComponents} ```
```gm2 = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -3.0377 -4.9859 Component 2: Mixing proportion: 0.500000 Mean: 0.9812 2.0563 ```

Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. You can use them to determine an appropriate number of components for a model when the number of components is unspecified.