Gaussian Mixture Models

Gaussian mixture models (GMM) are composed of k multivariate normal density components, where k is a positive integer. Each component has a d-dimensional mean (d is a positive integer), d-by-d covariance matrix, and a mixing proportion. Mixing proportion j determines the proportion of the population composed by component j, j = 1,...,k.

You can fit a GMM using the Statistics and Machine Learning Toolbox™ function fitgmdist by specifying k and by supplying X, an n-by-d matrix of data. The columns of X correspond to predictors, features, or attributes, and the rows correspond to observations or examples. By default, fitgmdist fits full covariance matrices that are different among components (or unshared).

fitgmdist fits GMMs to data using the iterative Expectation-Maximization (EM) algorithm. Using initial values for component means, covariance matrices, and mixing proportions, the EM algorithm proceeds using these steps.

  1. For each observation, the algorithm computes posterior probabilities of component memberships. You can think of the result as an n-by-k matrix, where element (i,j) contains the posterior probability that observation i is from component j. This is the E-step of the EM algorithm.

  2. Using the component-membership posterior probabilities as weights, the algorithm estimates the component means, covariance matrices, and mixing proportions by applying maximum likelihood. This is the M-step of the EM algorithm.

The algorithm iterates over these steps until convergence. The likelihood surface is complex, and the algorithm might converge to a local optimum. Also, the resulting local optimum might depend on the initial conditions. fitgmdist has several options for choosing initial conditions, including random component assignments for the observations and the k-means ++ algorithm.

fitgmdist returns a fitted gmdistribution model object. The object contains properties that store the estimation results, which include the estimated parameters, convergence information, and information criteria (Akaike and Bayesian information criteria). You can use dot notation to access the properties.

Once you have a fitted GMM, you can cluster query data using it. Clustering using GMM is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with GMM, see Clustering Using Gaussian Mixture Models.

See Also

| |

Related Examples

More About

Was this topic helpful?