(Not Recommended) Gaussian mixture parameter estimates
gmdistribution.fit
is not recommended. Use fitgmdist
instead.
obj = gmdistribution.fit(X,k)
obj = gmdistribution.fit(...,param1
,val1
,param2
,val2
,...)
obj = gmdistribution.fit(X,k)
uses an Expectation Maximization
(EM) algorithm to construct an object obj
of the gmdistribution
class containing maximum likelihood estimates of the
parameters in a Gaussian mixture model with k
components for data in
the nbym matrix X
, where
n is the number of observations and m is the
dimension of the data.
gmdistribution
treats NaN
values as missing
data. Rows of X
with NaN
values are excluded from
the fit.
obj = gmdistribution.fit(...,
provides control over the iterative EM algorithm. Parameters and values are listed
below.param1
,val1
,param2
,val2
,...)
Parameter  Value 

'Start'  Method used to choose initial component parameters. One of the following:

'Replicates'  A positive integer giving the number of times to repeat the EM
algorithm, each time with a new set of parameters. The solution with
the largest likelihood is returned. A value larger than 1 requires
the 
'CovType' 

'SharedCov'  Logical 
'Regularize'  A nonnegative regularization number added to the diagonal of covariance matrices to make them positivedefinite. The default is 0. 
'Options'  Options structure for the iterative EM algorithm, as created by

In some cases, gmdistribution
may converge to a solution where one
or more of the components has an illconditioned or singular covariance matrix.
The following issues may result in an illconditioned covariance matrix:
The number of dimension of your data is relatively high and there are not enough observations.
Some of the features (variables) of your data are highly correlated.
Some or all the features are discrete.
You tried to fit the data to too many components.
In general, you can avoid getting illconditioned covariance matrices by using one of the following precautions:
Preprocess your data to remove correlated features.
Set 'SharedCov'
to true
to use an
equal covariance matrix for every component.
Set 'CovType'
to 'diagonal'
.
Use 'Regularize'
to add a very small positive number to
the diagonal of every covariance matrix.
Try another set of initial values.
In other cases gmdistribution
may pass through an
intermediate step where one or more of the components has an illconditioned covariance
matrix. Trying another set of initial values may avoid this issue without altering your
data or model.
Generate data from a mixture of two bivariate Gaussian distributions using the
mvnrnd
function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [3 5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on
Next, fit a twocomponent Gaussian mixture model:
options = statset('Display','final'); obj = gmdistribution.fit(X,2,'Options',options); 10 iterations, loglikelihood = 7046.78 h = ezcontour(@(x,y)pdf(obj,[x y]),[8 6],[8 6]);
Among the properties of the fit are the parameter estimates:
ComponentMeans = obj.mu ComponentMeans = 0.9391 2.0322 2.9823 4.9737 ComponentCovariances = obj.Sigma ComponentCovariances(:,:,1) = 1.7786 0.0528 0.0528 0.5312 ComponentCovariances(:,:,2) = 1.0491 0.0150 0.0150 0.9816 MixtureProportions = obj.PComponents MixtureProportions = 0.5000 0.5000
The Akaike information is minimized by the twocomponent model:
AIC = zeros(1,4); obj = cell(1,4); for k = 1:4 obj{k} = gmdistribution.fit(X,k); AIC(k)= obj{k}.AIC; end [minAIC,numComponents] = min(AIC); numComponents numComponents = 2 model = obj{2} model = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: 0.9391 2.0322 Component 2: Mixing proportion: 0.500000 Mean: 2.9823 4.9737
Both the Akaike and Bayes information are negative loglikelihoods for the data with penalty terms for the number of estimated parameters. They are often used to determine an appropriate number of components for a model when the number of components is unspecified.
[1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.