Gaussian mixture models (GMM) are composed of k multivariate normal density components, where k is a positive integer. Each component has a d-dimensional mean (d is a positive integer), d-by-d covariance matrix, and a mixing proportion. Mixing proportion j determines the proportion of the population composed by component j, j = 1,...,k.
You can fit a GMM using the Statistics and Machine
Learning Toolbox™ function
fitgmdist by specifying k and
by supplying X, an n-by-d matrix
of data. The columns of X correspond to predictors,
features, or attributes, and the rows correspond to observations or
examples. By default,
fitgmdist fits full covariance
matrices that are different among components (or unshared).
fitgmdist fits GMMs to data using the iterative Expectation-Maximization (EM)
algorithm. Using initial values for component means, covariance matrices,
and mixing proportions, the EM algorithm proceeds using these steps.
For each observation, the algorithm computes posterior probabilities of component memberships. You can think of the result as an n-by-k matrix, where element (i,j) contains the posterior probability that observation i is from component j. This is the E-step of the EM algorithm.
Using the component-membership posterior probabilities as weights, the algorithm estimates the component means, covariance matrices, and mixing proportions by applying maximum likelihood. This is the M-step of the EM algorithm.
The algorithm iterates over these steps until convergence.
The likelihood surface is complex, and the algorithm might converge
to a local optimum. Also, the resulting local optimum might depend
on the initial conditions.
fitgmdist has several
options for choosing initial conditions, including random component
assignments for the observations and the k-means
fitgmdist returns a fitted
object. The object contains properties that store the estimation results,
which include the estimated parameters, convergence information, and
information criteria (Akaike and Bayesian information criteria). You
can use dot notation to access the properties.
Once you have a fitted GMM, you can cluster query data using it. Clustering using GMM is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with GMM, see Clustering Using Gaussian Mixture Models.