Gaussian mixture models (GMM) are composed of k multivariate normal density components, where k is a positive integer. Each component has a d-dimensional mean (d is a positive integer), d-by-d covariance matrix, and a mixing proportion. Mixing proportion j determines the proportion of the population composed by component j, j = 1,...,k.
You can create a Gaussian mixture distribution object
gmdistribution to create a fully specified
GMM object by specifying the component means, covariances, and mixture proportions. Use
fitgmdist to fit a GMM object to an
n-by-d matrix of the data X
by specifying the number of mixture components k. The columns of
X correspond to the predictors, features, or attributes. The rows
of X correspond to the observations or examples. By default,
fitgmdist fits full covariance matrices that are different
among components (or unshared).
fitgmdist fits GMMs to data using the iterative Expectation-Maximization (EM) algorithm. Using initial values for component means, covariance matrices, and mixing proportions, the EM algorithm proceeds using these steps.
For each observation, the algorithm computes posterior probabilities of component memberships. You can think of the result as an n-by-k matrix, where element (i,j) contains the posterior probability that observation i is from component j. This is the E-step of the EM algorithm.
Using the component-membership posterior probabilities as weights, the algorithm estimates the component means, covariance matrices, and mixing proportions by applying maximum likelihood. This is the M-step of the EM algorithm.
The algorithm iterates over these steps until convergence. The likelihood surface is complex, and the algorithm might converge to a local optimum. Also, the resulting local optimum might depend on the initial conditions.
fitgmdist has several options for choosing initial conditions, including random component assignments for the observations and the k-means ++ algorithm.
fitgmdist returns a fitted
gmdistribution model object. The object contains properties that store the
estimation results, which include the estimated parameters, convergence information, and
information criteria (Akaike and Bayesian information criteria). You can use dot
notation to access the properties.
Once you have a fitted GMM, you can cluster query data using it. Clustering using GMM is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with GMM, see Cluster Using Gaussian Mixture Models.