Note:

obj = gmdistribution.fit(X,k)
obj = gmdistribution.fit(...,param1
,val1
,param2
,val2
,...)
obj = gmdistribution.fit(X,k)
uses an Expectation
Maximization (EM) algorithm to construct an object obj
of
the gmdistribution
class
containing maximum likelihood estimates of the parameters in a Gaussian
mixture model with k
components for data in the nbyd matrix X
,
where n is the number of observations and d is
the dimension of the data.
gmdistribution
treats NaN
values
as missing data. Rows of X
with NaN
values
are excluded from the fit.
obj = gmdistribution.fit(...,
provides
control over the iterative EM algorithm. Parameters and values are
listed below.param1
,val1
,param2
,val2
,...)
Parameter  Value 

'Start'  Method used to choose initial component parameters. One of the following:

'Replicates'  A positive integer giving the number of times to repeat
the EM algorithm, each time with a new set of parameters. The solution
with the largest likelihood is returned. A value larger than 1 requires
the 
'CovType' 

'SharedCov'  Logical 
'Regularize'  A nonnegative regularization number added to the diagonal of covariance matrices to make them positivedefinite. The default is 0. 
'Options'  Options structure for the iterative EM algorithm, as
created by 
In some cases, gmdistribution
may converge
to a solution where one or more of the components has an illconditioned
or singular covariance matrix.
The following issues may result in an illconditioned covariance matrix:
The number of dimension of your data is relatively high and there are not enough observations.
Some of the features (variables) of your data are highly correlated.
Some or all the features are discrete.
You tried to fit the data to too many components.
In general, you can avoid getting illconditioned covariance matrices by using one of the following precautions:
Preprocess your data to remove correlated features.
Set 'SharedCov'
to true
to
use an equal covariance matrix for every component.
Set 'CovType'
to 'diagonal'
.
Use 'Regularize'
to add a very
small positive number to the diagonal of every covariance matrix.
Try another set of initial values.
In other cases gmdistribution
may
pass through an intermediate step where one or more of the components
has an illconditioned covariance matrix. Trying another set of initial
values may avoid this issue without altering your data or model.
Generate data from a mixture of two bivariate Gaussian distributions
using the mvnrnd
function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [3 5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on
Next, fit a twocomponent Gaussian mixture model:
options = statset('Display','final'); obj = gmdistribution.fit(X,2,'Options',options); 10 iterations, loglikelihood = 7046.78 h = ezcontour(@(x,y)pdf(obj,[x y]),[8 6],[8 6]);
Among the properties of the fit are the parameter estimates:
ComponentMeans = obj.mu ComponentMeans = 0.9391 2.0322 2.9823 4.9737 ComponentCovariances = obj.Sigma ComponentCovariances(:,:,1) = 1.7786 0.0528 0.0528 0.5312 ComponentCovariances(:,:,2) = 1.0491 0.0150 0.0150 0.9816 MixtureProportions = obj.PComponents MixtureProportions = 0.5000 0.5000
The Akaike information is minimized by the twocomponent model:
AIC = zeros(1,4); obj = cell(1,4); for k = 1:4 obj{k} = gmdistribution.fit(X,k); AIC(k)= obj{k}.AIC; end [minAIC,numComponents] = min(AIC); numComponents numComponents = 2 model = obj{2} model = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: 0.9391 2.0322 Component 2: Mixing proportion: 0.500000 Mean: 2.9823 4.9737
Both the Akaike and Bayes information are negative loglikelihoods for the data with penalty terms for the number of estimated parameters. They are often used to determine an appropriate number of components for a model when the number of components is unspecified.
[1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.