This is the standard EM algorithm for GMMs, presented in Bishop's book "Pattern Recognition and Machine Learning", Chapter 9, with one small exception, the addition of a uniform distribution to the mixture to pick up background noise/speckle; data points which one would not want to associate with any cluster.
NOTE: This function requires the MATLAB Statistical Toolbox and, for plotting the ellipses, the function error_ellipse, available from http://www.mathworks.com/matlabcentral/fileexchange/4705. Also requires at least MATLAB 7.9 (2009b)
For a demo example simply run GM_EM();
Plotting is provided automatically for 1D/2D cases with 5 GMs or less.
Usage: % GM_EM - fit a Gaussian mixture model to N points located in n-dimensional space.
% GM_EM(X,k) - fit a GMM to X, where X is N x n and k is the number of
% clusters. Algorithm follows steps outlined in Bishop
% (2009) 'Pattern Recognition and Machine Learning', Chapter 9.
% Optional inputs
% bn_noise - allow for uniform background noise term ('T' or 'F',
% default 'T'). If 'T', relevant classification uses the
% (k+1)th cluster
% reps - number of repetitions with different initial conditions
% (default = 10). Note: only the best fit (in a likelihood sense) is
% max_iters - maximum iteration number for EM algorithm (default = 100)
% tol - tolerance value (default = 0.01)
% idx - classification/labelling of data in X
% mu - GM centres
Thank you! A very nice contribution.
I used your program on a feature vector with 20 000 samples and I tried to make it faster. By replacing the matrix product by a vectorized implementation, avoiding the diag function, I achieved a speedup of a factor of 40.
Current matrix product implementation:
% tot_sum = (X'-repmat(mu(:,j),1,N)) * diag(gamma_znk(:,j)) * (X'-repmat(mu(:,j),1,N))';
% tot_sum = bsxfun(@times, X'-repmat(mu(:,j),1,N), gamma_znk(:,j)') * (X'-repmat(mu(:,j),1,N))';
I'm trying to run the code, but I keep getting this warning :
'Warning: chol failed, algorithm abandoned';
because the cholcov(Sigma(:,:,j),0); line always fails at the 2nd iteration (bn_noise='T') or 3rd iteration (bn_noise='F').
FYI, I have no NaN values in my data, and I get coherent results with kmeans() and emgm() [the submission that inspired this one]. Actually, no matter what data I feed into the function (e.g. squre matrix, rand(m,n), ...) this step always fails.
Any insight on this?
the input has to be square,right?
if my input data is not square, like 200x10, what should I do?
an "unknown" cluster, this is what we have been looking for. thanks a lot.
made help file more readable