version 1.0.0.0 (9.77 KB) by
Patrick Tsui

An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture.

Although EM algorithm for Gaussian mixture (EM_GM) learning is well known, 3 major MATLAB EM_GM codes are found on the web. However, they either have errors or not easy to incorporate into other MATLAB codes. Therefore, I decide to write my own EM_GM and share it. My EM_GM is designed as a single file function (i.e. all sub functions are included in the same file) for convenience and portability.

Detail descriptions of all inputs and outputs are included in the file. EM_GM can be controlled to plot 1D or 2D problems and display CPU time used as well as number of iterations.

Example:

X = zeros(600,2);

X(1:200,:) = normrnd(0,1,200,2);

X(201:400,:) = normrnd(0,2,200,2);

X(401:600,:) = normrnd(0,3,200,2);

[W,M,V,L] = EM_GM(X,3,[],[],1,[])

Patrick Tsui (2020). EM_GM (https://www.mathworks.com/matlabcentral/fileexchange/8636-em_gm), MATLAB Central File Exchange. Retrieved .

Created with
R13

Compatible with any release

**Inspired:**
Plot_GM, Fast EM_GM, Background Removal using Gaussian Model

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Nikita Letovyang liLeigh GaitherCan you please explain how to use this application? We are having difficulties getting it to run. We have tried many different approaches, please send an example with calling data from a csv file 10x2.

DongshengDongshengGood code!!! It works very well.

However, I found a bug. In Matlab R2013

for k=5:15

[W,M,V,L] = EM_GM(Data',k,0.001,1000,1,[]);

end

For certain k, it came out with error:

Attempted to access maxy(1); index out of bounds because numel(maxy)=0.

I noticed that some value in M were set as Nan.

However, for the same data in Matlab 7.1 it works perfectly fine.

MaxaonThanks

Moran ArtziHi,

Thank you so much for this tool

Is it possible to save clusters membership?

Thanks

Moran

mHi

thank you so much for your care Yashodhan.

I did so, but with transpose of X, same error occurred!!

Yashodhan AthavaleHello,

I think the error is popping up because you are processing X with dimensions 4601*57. I would suggest try taking the transpose of X, which might solve the problem.

Let me know if it works.

Regards

mHi, thanks for code, but I have a problem. the result of executing this code for my dataset isn't good! estimated W for first class is greater than 0.99 while real value is about 0.60!

so I want determine the Init input for your code,I have 2 classes with 57 features, this is my entries for Init : EM_GM(X,2,[],[],[],[[0.6 0.4],Me,Co]) ____ where X is 4601*57, Me is 57*2 , Co is 57*57*2.

but this error is shown on matlab command window without any output: "??? Error using ==> horzcat

CAT arguments dimensions are not consistent."

help me please immediately.

thanks

Da xingThank you!

Alexander FarleyI've written a mixture of Gaussians background subtraction algorithm using this script and it seems to work properly.

DongshengThx for your code. But I double checked that your likelihood calculation is wrong!

Sam DaYou did a good job writing it but it has bugs in it. It complains a lot about positive definiteness of matrices on data which produce good results with other similar code.

Petr ZapletalWhat is the time comlexity of your implementation?

PatrickWhen i run I get this error, Is there something else that i need to do before running this. Am running matlab 7.8.0 R2009a

>> X = zeros(600,2);

>> X(1:200,:) = normrnd(0,1,200,2);

>> X(201:400,:) = normrnd(0,2,200,2);

X(401:600,:) = normrnd(0,3,200,2);

>> [W,M,V,L] = EM_GM(X,3,[],[],1,[])

??? Cell contents reference from a non-cell array object.

Error in ==> getdim at 13

d = getdim(carray{i});

Error in ==> rand at 12

d = getdim(c);

Error in ==> randsample at 95

x(ceil(n * rand(1,k-sumx))) = 1; % sample w/replacement

Error in ==> kmeans at 321

Xsubset = X(randsample(n,floor(.1*n)),:);

Error in ==> EM_GM>Init_EM at 261

[Ci,C] = kmeans(X,k,'Start','cluster', ...

Error in ==> EM_GM at 74

[W,M,V] = Init_EM(X,k); L = 0;

Yashodhan AthavaleI tried modifying the code a little bit, by including the following lines of code,

function [W,M,V] = Init_EM(X,k)

[n,d] = size(X);

[Ci,C] = kmeans(X,k,'Start','cluster', ...

'Maxiter',100, ...

'EmptyAction','drop', ...

'Display','off'); % Ci(nx1) - cluster indeices; C(k,d) - cluster centroid (i.e. mean)

while sum(isnan(C))>0,

[Ci,C] = kmeans(X,k,'Start','cluster', ...

'Maxiter',100, ...

'EmptyAction','drop', ...

'Display','off');

end

M = C';

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Use this routine if the cluster matrix is of variable length and contains

% Scan the cluster index matrix Ci for NaN values, and remove them

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

temp1=Ci(:,1); % Scan the entire matrix column

temp2=isnan(temp1); % Find the rows with NaN values

index_active=find(temp2~=1); % Find the rows with actual values

Ci_new=temp1(index_active); %#ok<FNDSB>

[r,c]=size(Ci_new); % Determine the size of the newly scanned matrix.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Vp = repmat(struct('count',0,'X',zeros(r,c)),1,k);

for i=1:r % Separate cluster points

Vp(Ci_new(i)).count = Vp(Ci_new(i)).count + 1;

Vp(Ci_new(i)).X(Vp(Ci_new(i)).count,:) = X(i,:);

end

V = zeros(d,d,k);

for i=1:k,

W (i) = Vp(i).count/r;

% Find the covariance ignoring the NaNs

V(:,:,i) = nancov(Vp(i).X(1:Vp(i).count,:));

end

%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Init_EM %%%%

%%%%%%%%%%%%%%%%%%%%%%%%

I worked, giving me the estimates. But still I am getting some warnings with "Empty rows of data".

RoyFantastic piece of work !!

I understand EM method can be used for treatment of missing data. please suggest how to use this function for filling the missing dataset. thanks.

Yashodhan AthavaleI have a multidimensional file in Microsoft Excel. I imported the sheet into X, and executed [W,M,V,L] = EM_GM(X,3,[],[],1,[]);

I am getting the following errors,

[W,M,V,L] = EM_GM(X,1,[],[],1,[])

Warning: Ignoring rows of X with missing data.

> In kmeans at 127

In EM_GM>Init_EM at 261

In EM_GM at 74

??? Attempted to access C(1,2); index out of bounds because numel(C)=1.

Error in ==> kmeans>distfun at 722

D(:,i) = D(:,i) + (X(:,j) - C(i,j)).^2;

Error in ==> kmeans at 329

D = distfun(X, C, distance, 0);

Error in ==> EM_GM>Init_EM at 261

[Ci,C] = kmeans(X,k,'Start','cluster', ...

Error in ==> EM_GM at 74

[W,M,V] = Init_EM(X,k); L = 0;

Please Help....!!! thanks in advance..... :)

gd j,it's very helpful thanks

Adel Fakihit might be also any other implementation of netlab. type edit kmeans to find the culprit one.

Shawn_lj QingshengWhen I run [W,M,V,L] = EM_GM(X,3,[],[],1,[]); on matlab command as the example, the error happened, as below:

??? Error using ==> kmeans.kmeans

Too many input arguments.

Error in ==> EM_GM>Init_EM at 305

[Ci,C] = kmeans(X,k,'Start','cluster', ...

Error in ==> EM_GM at 63

[W,M,V] = Init_EM(X,k); L = 0;

Please help,thanks

chang YLthis is good.

Jason Leewe can speed up this piece of code by preallocating the memory of W

in init_EM

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_Init %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [W,M,V] = Init_EM(X,k)

[n,d] = size(X);

[Ci,C] = kmeans(X,k,'Start','cluster', ...

'Maxiter',100, ...

'EmptyAction','drop', ...

'Display','off'); % Ci(nx1) - cluster indeices; C(k,d) - cluster centroid (i.e. mean)

while sum(isnan(C))>0,

[Ci,C] = kmeans(X,k,'Start','cluster', ...

'Maxiter',100, ...

'EmptyAction','drop', ...

'Display','off');

end

M = C';

Vp = repmat(struct('count',0,'X',zeros(n,d)),1,k);

for i=1:n, % Separate cluster points

Vp(Ci(i)).count = Vp(Ci(i)).count + 1;

Vp(Ci(i)).X(Vp(Ci(i)).count,:) = X(i,:);

end

V = zeros(d,d,k);

% Preallocating W

W = zeros(1,k);

for i=1:k,

W(i) = Vp(i).count/n;

V(:,:,i) = cov(Vp(i).X(1:Vp(i).count,:));

end

%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Init_EM %%%%

%%%%%%%%%%%%%%%%%%%%%%%%

Cihat EldenizThis code computes the E-step and the M-step correctly. It is also readable and neat for the user. Nevertheless, there is a problem and an improvement to make I would like to report. The problem is that, the likelihood calculation is incorrect although the outputs of both E and M steps are correct. I have checked the reference to which the code is attributed. In fact, in that reference, the probability density function (pdf) is a single gaussian rather than a mixture. This is why there is a confusion leading to miscalculations. The actual likelihood function should be computed as [in LaTeX format]:

L = \sum^{n}_{i=1} \log(\sum^{k}_{j=1}W(j)*\mathcal{N}(X(:,i)^{T},M(:,j),V(:,:,j)))

The improvement is that there are some loops which can be replaced by matrix multiplications or some other matrix-related operations. Furthermore, the likelihood computation can easily be achieved inside the E-step.

In conclusion, the following code is the one modified with respect to the aforementioned points:

function [W,M,V,L,E] = EM_GM(X,k,ltol,maxiter,pflag,Init)

% Inputs:

% X(n,d) - input data, n=number of observations, d=dimension of variable

% k - maximum number of Gaussian components allowed

% ltol - percentage of the log likelihood difference between 2 iterations ([] for none)

% maxiter - maximum number of iteration allowed ([] for none)

% pflag - 1 for plotting GM for 1D or 2D cases only, 0 otherwise ([] for none)

% Init - structure of initial W, M, V: Init.W, Init.M, Init.V ([] for none)

%

% Ouputs:

% W(1,k) - estimated weights of GM

% M(d,k) - estimated mean vectors of GM

% V(d,d,k) - estimated covariance matrices of GM

% L - log likelihood of estimates

%

%%%% Validate inputs %%%%

if nargin <= 1,

disp('EM_GM must have at least 2 inputs: X,k!/n')

return

elseif nargin == 2,

ltol = 0.1; maxiter = 1000; pflag = 0; Init = [];

err_X = Verify_X(X);

err_k = Verify_k(k);

if err_X | err_k, return; end

elseif nargin == 3,

maxiter = 1000; pflag = 0; Init = [];

err_X = Verify_X(X);

err_k = Verify_k(k);

[ltol,err_ltol] = Verify_ltol(ltol);

if err_X | err_k | err_ltol, return; end

elseif nargin == 4,

pflag = 0; Init = [];

err_X = Verify_X(X);

err_k = Verify_k(k);

[ltol,err_ltol] = Verify_ltol(ltol);

[maxiter,err_maxiter] = Verify_maxiter(maxiter);

if err_X | err_k | err_ltol | err_maxiter, return; end

elseif nargin == 5,

Init = [];

err_X = Verify_X(X);

err_k = Verify_k(k);

[ltol,err_ltol] = Verify_ltol(ltol);

[maxiter,err_maxiter] = Verify_maxiter(maxiter);

[pflag,err_pflag] = Verify_pflag(pflag);

if err_X | err_k | err_ltol | err_maxiter | err_pflag, return; end

elseif nargin == 6,

err_X = Verify_X(X);

err_k = Verify_k(k);

[ltol,err_ltol] = Verify_ltol(ltol);

[maxiter,err_maxiter] = Verify_maxiter(maxiter);

[pflag,err_pflag] = Verify_pflag(pflag);

[Init,err_Init]=Verify_Init(Init);

if err_X | err_k | err_ltol | err_maxiter | err_pflag | err_Init, return; end

else

disp('EM_GM must have 2 to 6 inputs!');

return

end

%%%% Initialize W, M, V,L %%%%

t = cputime;

if isempty(Init),

[W,M,V] = Init_EM(X,k); L = 0;

else

W = Init.W;

M = Init.M;

V = Init.V;

end

[E,Ln] = Expectation(X,k,W,M,V); % Initialize log likelihood and compute expectation

Lo = 2*Ln;

%%%% EM algorithm %%%%

niter = 0;

while (abs(100*(Ln-Lo)/Lo)>ltol) & (niter<=maxiter),

%**************************************************************************

% For the first loop, expectation is computed above (cf. Line 69). For the

% next loops, it is computed in their preceding loops. Therefore, the

% following line [Line 80] is omitted.

%**************************************************************************

% E = Expectation(X,k,W,M,V); % E-step

[W,M,V] = Maximization(X,k,E); % M-step

Lo = Ln;

[E,Ln] = Expectation(X,k,W,M,V); % Log-likelihood and expectation computation

niter = niter + 1;

LnValues(niter) = Ln;

disp(['Iteration # = ' num2str(niter) ', Ln = ' num2str(Ln) ', Stopping Criterion = ' num2str(abs(100*(Ln-Lo)/Lo)) ', Tolerance = ' num2str(ltol)])

end

L = Ln;

theTime = num2str(fix(clock));

theTime(theTime==' ') = '_';

save(['EM_CostFunction_' theTime '_MoreLoopsRemoved.mat'],'LnValues')

%%%% Plot 1D or 2D %%%%

if pflag==1,

[n,d] = size(X);

if d>2,

disp('Can only plot 1 or 2 dimensional applications!/n');

else

Plot_GM(X,k,W,M,V);

end

elapsed_time = sprintf('CPU time used for EM_GM: %5.2fs',cputime-t);

disp(elapsed_time);

disp(sprintf('Number of iterations: %d',niter-1));

end

%%%%%%%%%%%%%%%%%%%%%%

%%%% End of EM_GM %%%%

%%%%%%%%%%%%%%%%%%%%%%

function [E,L] = Expectation(X,k,W,M,V)

[n,d] = size(X);

a = (2*pi)^(0.5*d);

S = zeros(1,k);

iV = zeros(d,d,k);

for j=1:k,

if V(:,:,j)==zeros(d,d), V(:,:,j)=ones(d,d)*eps; end

S(j) = sqrt(det(V(:,:,j)));

iV(:,:,j) = inv(V(:,:,j));

end

%*************************************************************************

%*************************** MODIFICATION: LOOP REMOVAL *****************

%*************************************************************************

E = zeros(n,k);

% for i=1:n,

% for j=1:k,

% dXM = X(i,:)'-M(:,j);

% pl = exp(-0.5*dXM'*iV(:,:,j)*dXM)/(a*S(j));

% E(i,j) = W(j)*pl;

% end

% E(i,:) = E(i,:)/sum(E(i,:));

% end

chunkSize = 1000;

howManyFullChunks = fix(n/chunkSize);

numberOfRemainingVectors = n - howManyFullChunks*chunkSize;

pdfValueVector = zeros(n,1);

for j = 1:k

for chunkCounter = 1:howManyFullChunks

modificationRange = (chunkCounter-1)*chunkSize+1:chunkCounter*chunkSize;

meanSubtractedChunk = X(modificationRange,:)'-repmat(M(:,j),1,chunkSize);

pdfValueVectorToExp = diag(-0.5*meanSubtractedChunk'*iV(:,:,j)*meanSubtractedChunk);

pdfValueVector(modificationRange) = exp(pdfValueVectorToExp)/(a*S(j));

end

modificationRange = howManyFullChunks*chunkSize+1:n;

meanSubtractedChunk = X(modificationRange,:)'-repmat(M(:,j),1,length(modificationRange));

pdfValueVectorToExp = diag(-0.5*meanSubtractedChunk'*iV(:,:,j)*meanSubtractedChunk);

pdfValueVector(modificationRange) = exp(pdfValueVectorToExp)/(a*S(j));

E(:,j) = W(j)*pdfValueVector';

end

sumInColumnDirection = sum(E,2);

divisor = repmat(sumInColumnDirection,1,k);

L = sum(log(sum(E,2)));

E = E./divisor;

%**************************************************************************

%*************************** MODIFICATION ENDS HERE ***********************

%**************************************************************************

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Expectation %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [W,M,V] = Maximization(X,k,E)

[n,d] = size(X);

V = zeros(d,d,k);

%*************************************************************************

%*************************** MODIFICATION: LOOP REMOVAL *****************

%*************************************************************************

% W = zeros(1,k); M = zeros(d,k);

% for i=1:k, % Compute weights

% for j=1:n,

% W(i) = W(i) + E(j,i);

% M(:,i) = M(:,i) + E(j,i)*X(j,:)';

% end

% M(:,i) = M(:,i)/W(i);

% end

W = sum(E,1);

M = X'*E;

M = M./repmat(W,d,1);

%**************************************************************************

%*************************** MODIFICATION ENDS HERE ***********************

%**************************************************************************

for i=1:k,

for j=1:n,

dXM = X(j,:)'-M(:,i);

V(:,:,i) = V(:,:,i) + E(j,i)*dXM*dXM';

end

V(:,:,i) = V(:,:,i)/W(i);

end

W = W/n;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Maximization %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%**************************************************************************

%************************ LIKELIHOOD REMOVED COMPLETELY *******************

%**************************************************************************

% function L = Likelihood(X,k,W,M,V)

% Compute L based on K. V. Mardia, "Multivariate Analysis", Academic Press, 1979, PP. 96-97

% % to enchance computational speed

% [n,d] = size(X);

% U = mean(X)';

% S = cov(X);

% L = 0;

% for i=1:k,

% iV = inv(V(:,:,i));

% L = L + W(i)*(-0.5*n*log(det(2*pi*V(:,:,i))) ...

% -0.5*(n-1)*(trace(iV*S)+(U-M(:,i))'*iV*(U-M(:,i))));

% end

%**************************************************************************

%*************************** MODIFICATION ENDS HERE ***********************

%**************************************************************************

%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Likelihood %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%

function err_X = Verify_X(X)

err_X = 1;

[n,d] = size(X);

if n<d,

disp('Input data must be n x d!/n');

return

end

err_X = 0;

%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_X %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%

function err_k = Verify_k(k)

err_k = 1;

if ~isnumeric(k) | ~isreal(k) | k<1,

disp('k must be a real integer >= 1!/n');

return

end

err_k = 0;

%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_k %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%

function [ltol,err_ltol] = Verify_ltol(ltol)

err_ltol = 1;

if isempty(ltol),

ltol = 1e-2;

elseif ~isreal(ltol) | ltol<=0,

disp('ltol must be a positive real number!');

return

end

err_ltol = 0;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_ltol %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [maxiter,err_maxiter] = Verify_maxiter(maxiter)

err_maxiter = 1;

if isempty(maxiter),

maxiter = 1000;

elseif ~isreal(maxiter) | maxiter<=0,

disp('ltol must be a positive real number!');

return

end

err_maxiter = 0;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_maxiter %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [pflag,err_pflag] = Verify_pflag(pflag)

err_pflag = 1;

if isempty(pflag),

pflag = 0;

elseif pflag~=0 & pflag~=1,

disp('Plot flag must be either 0 or 1!/n');

return

end

err_pflag = 0;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_pflag %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [Init,err_Init] = Verify_Init(Init)

err_Init = 1;

if isempty(Init),

% Do nothing;

elseif isstruct(Init),

[Wd,Wk] = size(Init.W);

[Md,Mk] = size(Init.M);

[Vd1,Vd2,Vk] = size(Init.V);

if Wk~=Mk | Wk~=Vk | Mk~=Vk,

disp('k in Init.W(1,k), Init.M(d,k) and Init.V(d,d,k) must equal!/n')

return

end

if Md~=Vd1 | Md~=Vd2 | Vd1~=Vd2,

disp('d in Init.W(1,k), Init.M(d,k) and Init.V(d,d,k) must equal!/n')

return

end

else

disp('Init must be a structure: W(1,k), M(d,k), V(d,d,k) or []!');

return

end

err_Init = 0;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Verify_Init %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [W,M,V] = Init_EM(X,k)

[n,d] = size(X);

[Ci,C] = kmeans(X,k,'Start','cluster', ...

'Maxiter',100, ...

'EmptyAction','drop', ...

'Display','off'); % Ci(nx1) - cluster indeices; C(k,d) - cluster centroid (i.e. mean)

while sum(isnan(C))>0,

[Ci,C] = kmeans(X,k,'Start','cluster', ...

'Maxiter',100, ...

'EmptyAction','drop', ...

'Display','off');

end

M = C';

Vp = repmat(struct('count',0,'X',zeros(n,d)),1,k);

for i=1:n, % Separate cluster points

Vp(Ci(i)).count = Vp(Ci(i)).count + 1;

Vp(Ci(i)).X(Vp(Ci(i)).count,:) = X(i,:);

end

V = zeros(d,d,k);

for i=1:k,

W(i) = Vp(i).count/n;

V(:,:,i) = cov(Vp(i).X(1:Vp(i).count,:));

end

%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Init_EM %%%%

%%%%%%%%%%%%%%%%%%%%%%%%

function Plot_GM(X,k,W,M,V)

[n,d] = size(X);

if d>2,

disp('Can only plot 1 or 2 dimensional applications!/n');

return

end

S = zeros(d,k);

R1 = zeros(d,k);

R2 = zeros(d,k);

for i=1:k, % Determine plot range as 4 x standard deviations

S(:,i) = sqrt(diag(V(:,:,i)));

R1(:,i) = M(:,i)-4*S(:,i);

R2(:,i) = M(:,i)+4*S(:,i);

end

Rmin = min(min(R1));

Rmax = max(max(R2));

R = [Rmin:0.001*(Rmax-Rmin):Rmax];

% clf, hold on

figure; hold on

if d==1,

Q = zeros(size(R));

for i=1:k,

P = W(i)*normpdf(R,M(:,i),sqrt(V(:,:,i)));

Q = Q + P;

plot(R,P,'r-'); grid on,

end

plot(R,Q,'k-');

xlabel('X');

ylabel('Probability density');

else % d==2

% plot(X(:,1),X(:,2),'r.');

plot(X(1:200,1),X(1:200,2),'r.');

plot(X(201:400,1),X(201:400,2),'g.');

plot(X(401:600,1),X(401:600,2),'b.');

for i=1:k,

Plot_Std_Ellipse(M(:,i),V(:,:,i));

end

xlabel('1^{st} dimension');

ylabel('2^{nd} dimension');

axis([Rmin Rmax Rmin Rmax])

end

title('Gaussian Mixture estimated by EM');

%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Plot_GM %%%%

%%%%%%%%%%%%%%%%%%%%%%%%

function Plot_Std_Ellipse(M,V)

[Ev,D] = eig(V);

d = length(M);

if V(:,:)==zeros(d,d),

V(:,:) = ones(d,d)*eps;

end

iV = inv(V);

% Find the larger projection

P = [1,0;0,0]; % X-axis projection operator

P1 = P * 2*sqrt(D(1,1)) * Ev(:,1);

P2 = P * 2*sqrt(D(2,2)) * Ev(:,2);

if abs(P1(1)) >= abs(P2(1)),

Plen = P1(1);

else

Plen = P2(1);

end

count = 1;

step = 0.001*Plen;

Contour1 = zeros(2001,2);

Contour2 = zeros(2001,2);

for x = -Plen:step:Plen,

a = iV(2,2);

b = x * (iV(1,2)+iV(2,1));

c = (x^2) * iV(1,1) - 1;

Root1 = (-b + sqrt(b^2 - 4*a*c))/(2*a);

Root2 = (-b - sqrt(b^2 - 4*a*c))/(2*a);

if isreal(Root1),

Contour1(count,:) = [x,Root1] + M';

Contour2(count,:) = [x,Root2] + M';

count = count + 1;

end

end

Contour1 = Contour1(1:count-1,:);

Contour2 = [Contour1(1,:);Contour2(1:count-1,:);Contour1(count-1,:)];

plot(M(1),M(2),'k+');

plot(Contour1(:,1),Contour1(:,2),'k-');

plot(Contour2(:,1),Contour2(:,2),'k-');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% End of Plot_Std_Ellipse %%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

David LiuSeems like the log likelihood sometime decreases, which should not happen. Perhaps the formula for computing the log likelihood is incorrect?!

E FontaineThanks for this well written piece of code

Vaibhav Bediais there any way i could find where the gaussians fit in the curve and not have the mean on the x-axis

prabu chandranNice code. Its very easy to use - no variations required and the usuage of the function given makes the whole process of integration of the function simpler. Thank you.

Javier VicenteNash Borgeshere's an approximation:

http://nlp.cs.byu.edu/mediawiki/index.php/Log_Domain_Computations

Nash BorgesNoticed something that may be incorrect. When using multiple mixtures, you cannot compute the log likelihood of all of the data under each mixture and then sum them. You need to do the sum before going into the log domain.

Phong VoI found some error prones : the code can't deal when covariance matrix becomes singular.

P-Marc JodoinSuper great! can't be better!

Dima DamenThank you... well documented, easy to use and flexible interface

P. BaruahThanks to author of this code! it was very helpful in understanding the EM process.

Nick van RodijnenThe algorithm converts very well to a good solution and is reasonable stable if applied multiple times to the same data. It is even faster than the greedy EM algorithm. My compliments

Martin HembergExcellent script, well documented and easy to use.

sujeesh kumarJoan keesreally great

Pete SantagoI tried the one from the classification toolbox mentioned in the Duda, Hart, and Stork text. It was impossible to figure out.

This one worked right off. Thanks.

Gang LiangFirst, thanks for sharing the code with the community.

The Likelihood function implemented is problematic. The short cut to compute the likelihood function of a Gaussian mixture model is not correct even though it might work well in many cases...

Mikel RodriguezMr. Tsui's implementation does indeed resolve several errors found in several implementations that I have reviewed.

Ed LawsonOutstanding. Works just as expected.

I and I, JuhI thougth I'd had to writte something like this my-self. I would then have spend a lot of time, and it would never have been as perfect.

Thank you so much !!!

Zhenhai WangTerrific job, Patrick. Thanks a lot for sharing

wajih ouertaniReally Compact and professionally written, extremely useful . Thanks for sharing !

glinda maIt's very useful for me. Thanks a lot.

Jean-Christophe TerrillonThis is the first time that I could find a stand alone function for Gaussian mixture estimation.Compact and professionally written, extremely useful. It saved my day at a critical time in my research. Thank you Patrick!

Yan YangThanks for sharing!! I am really glad to have found this EM matlab code. I've tested it over randomly generated data in Matlab and this function works well. It runs a little bit slow when the data size gets larger and when there is no initialization for the parameters, but I think that is because of the iterative nature of the algorithm, not the implementation. Thanks again.

Michael BoedigheimerAs the author suggests, this routine estimates the parameters for a mixture of multivariate normal distributions (ala Aaron D'Souza's article from USC, "Using EM To Estimate A Probablity[sic] Density With A Mixture of Gaussians"). Runs quickly and produces reasonable answers.

Wish that the functions were factored into separate files so I could use the nice ellipse plotting and other functions as stand-alones. But that's nitpicking.

Mba Emmanuel IkechukwuIt sounds very interesting and vry resourseful