Code covered by the BSD License

Highlights from EM algorithm for Gaussian mixture model with background noise

5.0
5.0 | 3 ratings Rate this file 18 Downloads (last 30 days) File Size: 3.07 KB File ID: #36721 Version: 1.1

EM algorithm for Gaussian mixture model with background noise

by

Andrew (view profile)

16 May 2012 (Updated )

Standard EM algorithm to fit a GMM with the (optional) consideration of background noise.

File Information
Description

This is the standard EM algorithm for GMMs, presented in Bishop's book "Pattern Recognition and Machine Learning", Chapter 9, with one small exception, the addition of a uniform distribution to the mixture to pick up background noise/speckle; data points which one would not want to associate with any cluster.

NOTE: This function requires the MATLAB Statistical Toolbox and, for plotting the ellipses, the function error_ellipse, available from http://www.mathworks.com/matlabcentral/fileexchange/4705. Also requires at least MATLAB 7.9 (2009b)

For a demo example simply run GM_EM();
Plotting is provided automatically for 1D/2D cases with 5 GMs or less.

Usage: % GM_EM - fit a Gaussian mixture model to N points located in n-dimensional space.
% GM_EM(X,k) - fit a GMM to X, where X is N x n and k is the number of
% clusters. Algorithm follows steps outlined in Bishop
% (2009) 'Pattern Recognition and Machine Learning', Chapter 9.

% Optional inputs
% bn_noise - allow for uniform background noise term ('T' or 'F',
% default 'T'). If 'T', relevant classification uses the
% (k+1)th cluster
% reps - number of repetitions with different initial conditions
% (default = 10). Note: only the best fit (in a likelihood sense) is
% returned.
% max_iters - maximum iteration number for EM algorithm (default = 100)
% tol - tolerance value (default = 0.01)

% Outputs
% idx - classification/labelling of data in X
% mu - GM centres

Acknowledgements

Em Algorithm For Gaussian Mixture Model (Em Gmm) inspired this file.

Required Products Statistics and Machine Learning Toolbox
MATLAB
MATLAB release MATLAB 7.9 (R2009b)
02 Feb 2016 Anders Ueland

Anders Ueland (view profile)

Thank you! A very nice contribution.

I used your program on a feature vector with 20 000 samples and I tried to make it faster. By replacing the matrix product by a vectorized implementation, avoiding the diag function, I achieved a speedup of a factor of 40.

Current matrix product implementation:
% tot_sum = (X'-repmat(mu(:,j),1,N)) * diag(gamma_znk(:,j)) * (X'-repmat(mu(:,j),1,N))';

Suggested implementation:
% tot_sum = bsxfun(@times, X'-repmat(mu(:,j),1,N), gamma_znk(:,j)') * (X'-repmat(mu(:,j),1,N))';

08 Aug 2014 David Provencher

David Provencher (view profile)

I'm trying to run the code, but I keep getting this warning :

'Warning: chol failed, algorithm abandoned';

because the cholcov(Sigma(:,:,j),0); line always fails at the 2nd iteration (bn_noise='T') or 3rd iteration (bn_noise='F').

FYI, I have no NaN values in my data, and I get coherent results with kmeans() and emgm() [the submission that inspired this one]. Actually, no matter what data I feed into the function (e.g. squre matrix, rand(m,n), ...) this step always fails.

Any insight on this?
Thanks,
David

Comment only
01 Oct 2012 Jin Wang

Jin Wang (view profile)

the input has to be square,right?
if my input data is not square, like 200x10, what should I do?
Thanks!

Comment only
16 May 2012 peter

peter (view profile)

an "unknown" cluster, this is what we have been looking for. thanks a lot.