| Contents | Index |
Gaussian mixture parameter estimates
obj = gmdistribution.fit(X,k)
obj = gmdistribution.fit(...,param1,val1,param2,val2,...)
obj = gmdistribution.fit(X,k) uses an Expectation Maximization (EM) algorithm to construct an object obj of the gmdistribution class containing maximum likelihood estimates of the parameters in a Gaussian mixture model with k components for data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data.
gmdistribution treats NaN values as missing data. Rows of X with NaN values are excluded from the fit.
obj = gmdistribution.fit(...,param1,val1,param2,val2,...) provides control over the iterative EM algorithm. Parameters and values are listed below.
| Parameter | Value |
|---|---|
| 'Start' | Method used to choose initial component parameters. One of the following:
|
| 'Replicates' | A positive integer giving the number of times to repeat the EM algorithm, each time with a new set of parameters. The solution with the largest likelihood is returned. A value larger than 1 requires the 'randSample' start method. The default is 1. |
| 'CovType' | 'diagonal' if the covariance matrices are restricted to be diagonal; 'full' otherwise. The default is 'full'. |
| 'SharedCov' | Logical true if all the covariance matrices are restricted to be the same (pooled estimate); logical false otherwise. |
| 'Regularize' | A nonnegative regularization number added to the diagonal of covariance matrices to make them positive-definite. The default is 0. |
| 'Options' | Options structure for the iterative EM algorithm, as created by statset. gmdistribution.fit uses the parameters 'Display' with a default value of 'off', 'MaxIter' with a default value of 100, and 'TolFun' with a default value of 1e-6. |
In some cases, gmdistribution may converge to a solution where one or more of the components has an ill-conditioned or singular covariance matrix.
The following issues may result in an ill-conditioned covariance matrix:
The number of dimension of your data is relatively high and there are not enough observations.
Some of the features (variables) of your data are highly correlated.
Some or all the features are discrete.
You tried to fit the data to too many components.
In general, you can avoid getting ill-conditioned covariance matrices by using one of the following precautions:
Pre-process your data to remove correlated features.
Set 'SharedCov' to true to use an equal covariance matrix for every component.
Set 'CovType' to 'diagonal'.
Use 'Regularize' to add a very small positive number to the diagonal of every covariance matrix.
Try another set of initial values.
In other cases gmdistribution may pass through an intermediate step where one or more of the components has an ill-conditioned covariance matrix. Trying another set of initial values may avoid this issue without altering your data or model.
Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on

Next, fit a two-component Gaussian mixture model:
options = statset('Display','final');
obj = gmdistribution.fit(X,2,'Options',options);
10 iterations, log-likelihood = -7046.78
h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

Among the properties of the fit are the parameter estimates:
ComponentMeans = obj.mu
ComponentMeans =
0.9391 2.0322
-2.9823 -4.9737
ComponentCovariances = obj.Sigma
ComponentCovariances(:,:,1) =
1.7786 -0.0528
-0.0528 0.5312
ComponentCovariances(:,:,2) =
1.0491 -0.0150
-0.0150 0.9816
MixtureProportions = obj.PComponents
MixtureProportions =
0.5000 0.5000The Akaike information is minimized by the two-component model:
AIC = zeros(1,4);
obj = cell(1,4);
for k = 1:4
obj{k} = gmdistribution.fit(X,k);
AIC(k)= obj{k}.AIC;
end
[minAIC,numComponents] = min(AIC);
numComponents
numComponents =
2
model = obj{2}
model =
Gaussian mixture distribution
with 2 components in 2 dimensions
Component 1:
Mixing proportion: 0.500000
Mean: 0.9391 2.0322
Component 2:
Mixing proportion: 0.500000
Mean: -2.9823 -4.9737Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. They are often used to determine an appropriate number of components for a model when the number of components is unspecified.
[1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.
| © 1984-2012- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |