Gaussian mixture models (GMM) are composed of *k* multivariate
normal density components, where *k* is a positive
integer. Each component has a *d*-dimensional mean
(*d* is a positive integer), *d*-by-*d* covariance
matrix, and a mixing proportion. Mixing proportion *j* determines
the proportion of the population composed by component *j*, *j* =
1,...,*k*.

You can fit a GMM using the Statistics and Machine Learning Toolbox™ function `fitgmdist`

by specifying *k* and
by supplying *X*, an *n*-by-*d* matrix
of data. The columns of *X* correspond to predictors,
features, or attributes, and the rows correspond to observations or
examples. By default, `fitgmdist`

fits full covariance
matrices that are different among components (or unshared).

`fitgmdist`

fits GMMs to data using the iterative *Expectation-Maximization* (EM)
algorithm. Using initial values for component means, covariance matrices,
and mixing proportions, the EM algorithm proceeds using these steps.

For each observation, the algorithm computes posterior probabilities of component memberships. You can think of the result as an

*n*-by-*k*matrix, where element (*i*,*j*) contains the posterior probability that observation*i*is from component*j*. This is the*E*-step of the EM algorithm.Using the component-membership posterior probabilities as weights, the algorithm estimates the component means, covariance matrices, and mixing proportions by applying maximum likelihood. This is the

*M*-step of the EM algorithm.

The algorithm iterates over these steps until convergence.
The likelihood surface is complex, and the algorithm might converge
to a local optimum. Also, the resulting local optimum might depend
on the initial conditions. `fitgmdist`

has several
options for choosing initial conditions, including random component
assignments for the observations and the *k*-means
++ algorithm.

`fitgmdist`

returns a fitted `gmdistribution`

model
object. The object contains properties that store the estimation results,
which include the estimated parameters, convergence information, and
information criteria (Akaike and Bayesian information criteria). You
can use dot notation to access the properties.

Once you have a fitted GMM, you can cluster query data using it. Clustering using GMM is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with GMM, see Clustering Using Gaussian Mixture Models.

`cluster`

| `fitgmdist`

| `gmdistribution`

- Create a Gaussian Mixture Model
- Fit a Gaussian Mixture Model to Data
- Simulate Data from a Gaussian Mixture Model

Was this topic helpful?