| Statistics Toolbox™ | ![]() |
obj = gmdistribution.fit(X,k)
obj = gmdistribution.fit(...,param1,val1,param2,val2,...)
obj = gmdistribution.fit(X,k) uses the Expectation Maximization (EM) algorithm to construct an object obj of the @gmdistribution class containing maximum likelihood estimates of the parameters in a Gaussian mixture model with k components for data in the n-by-d matrix X, where n is the number of observations and d is the dimension of the data.
gmdistribution treats NaN values as missing data. Rows of X with NaN values are excluded from the fit.
obj = gmdistribution.fit(...,param1,val1,param2,val2,...) provides control over the iterative EM algorithm. Parameters and values are listed below.
| Parameter | Value |
|---|---|
| 'Start' | Method used to choose initial component parameters. One of the following:
|
| 'Replicates' | A positive integer giving the number of times to repeat the EM algorithm, each time with a new set of parameters. The solution with the largest likelihood is returned. A value larger than 1 requires the 'randSample' start method. The default is 1. |
| 'CovType' | 'diagonal' if the covariance matrices are restricted to be diagonal; 'full' otherwise. The default is 'full'. |
| 'SharedCov' | Logical true if all the covariance matrices are restricted to be the same (pooled estimate); logical false otherwise. |
| 'Regularize' | A nonnegative regularization number added to the diagonal of covariance matrices to make them positive-definite. The default is 0. |
| 'Options' | Options structure for the iterative EM algorithm, as created by statset. gmdistribution.fit uses the parameters 'Display' with a default value of 'off', 'MaxIter' with a default value of 100, and 'TolFun' with a default value of 1e6. |
[1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.
Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:
MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on

Next, fit a two-component Gaussian mixture model:
options = statset('Display','final');
obj = gmdistribution.fit(X,2,'Options',options);
10 iterations, log-likelihood = -7046.78
h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

Among the properties of the fit are the parameter estimates:
ComponentMeans = obj.mu
ComponentMeans =
0.9391 2.0322
-2.9823 -4.9737
ComponentCovariances = obj.Sigma
ComponentCovariances(:,:,1) =
1.7786 -0.0528
-0.0528 0.5312
ComponentCovariances(:,:,2) =
1.0491 -0.0150
-0.0150 0.9816
MixtureProportions = obj.PComponents
MixtureProportions =
0.5000 0.5000The Akaike information is minimized by the two-component model:
AIC = zeros(1,4);
obj = cell(1,4);
for k = 1:4
obj{k} = gmdistribution.fit(X,k);
AIC(k)= obj{k}.AIC;
end
[minAIC,numComponents] = min(AIC);
numComponents
numComponents =
2
model = obj{2}
model =
Gaussian mixture distribution
with 2 components in 2 dimensions
Component 1:
Mixing proportion: 0.500000
Mean: 0.9391 2.0322
Component 2:
Mixing proportion: 0.500000
Mean: -2.9823 -4.9737Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. They are often used to determine an appropriate number of components for a model when the number of components is unspecified.
![]() | finv | fpdf | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |