Search Comments and Ratings

go

   
Date File Comment by Comment Rating
26 Oct 2014 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Subhrodip

14 Apr 2014 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Dongsheng

13 Apr 2014 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Dongsheng

Good code!!! It works very well.

However, I found a bug. In Matlab R2013
for k=5:15
[W,M,V,L] = EM_GM(Data',k,0.001,1000,1,[]);
end
For certain k, it came out with error:
Attempted to access maxy(1); index out of bounds because numel(maxy)=0.

I noticed that some value in M were set as Nan.

However, for the same data in Matlab 7.1 it works perfectly fine.

28 Mar 2014 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Maxaon

Thanks

24 Jan 2013 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Moran Artzi

Hi,
Thank you so much for this tool
Is it possible to save clusters membership?

Thanks
Moran

17 Jun 2012 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui m

Hi
thank you so much for your care Yashodhan.

I did so, but with transpose of X, same error occurred!!

17 Jun 2012 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Yashodhan Athavale

Hello,

I think the error is popping up because you are processing X with dimensions 4601*57. I would suggest try taking the transpose of X, which might solve the problem.

Let me know if it works.
Regards

16 Jun 2012 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui m

Hi, thanks for code, but I have a problem. the result of executing this code for my dataset isn't good! estimated W for first class is greater than 0.99 while real value is about 0.60!

so I want determine the Init input for your code,I have 2 classes with 57 features, this is my entries for Init : EM_GM(X,2,[],[],[],[[0.6 0.4],Me,Co]) ____ where X is 4601*57, Me is 57*2 , Co is 57*57*2.
but this error is shown on matlab command window without any output: "??? Error using ==> horzcat
CAT arguments dimensions are not consistent."

help me please immediately.
thanks

11 Jun 2012 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Da xing

Thank you!

04 Dec 2011 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Alexander Farley

I've written a mixture of Gaussians background subtraction algorithm using this script and it seems to work properly.

24 Nov 2011 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Dongsheng

Thx for your code. But I double checked that your likelihood calculation is wrong!

15 Nov 2011 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Sam Da

You did a good job writing it but it has bugs in it. It complains a lot about positive definiteness of matrices on data which produce good results with other similar code.

12 Jun 2011 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Petr Zapletal

What is the time comlexity of your implementation?

06 Feb 2011 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui CT

Hi Patrick,

I'm using your EM_GM_fast code. Could you please tell me is there the option that permits to estimate Gaussian mixtures with diagonal covariance matrices? Thank you.

Cong-Thanh Do

21 Sep 2010 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Patrick

When i run I get this error, Is there something else that i need to do before running this. Am running matlab 7.8.0 R2009a

>> X = zeros(600,2);
>> X(1:200,:) = normrnd(0,1,200,2);
>> X(201:400,:) = normrnd(0,2,200,2);
X(401:600,:) = normrnd(0,3,200,2);
>> [W,M,V,L] = EM_GM(X,3,[],[],1,[])
??? Cell contents reference from a non-cell array object.

Error in ==> getdim at 13
d = getdim(carray{i});

Error in ==> rand at 12
d = getdim(c);

Error in ==> randsample at 95
x(ceil(n * rand(1,k-sumx))) = 1; % sample w/replacement

Error in ==> kmeans at 321
Xsubset = X(randsample(n,floor(.1*n)),:);

Error in ==> EM_GM>Init_EM at 261
[Ci,C] = kmeans(X,k,'Start','cluster', ...

Error in ==> EM_GM at 74
[W,M,V] = Init_EM(X,k); L = 0;

25 Jan 2010 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Yashodhan Athavale

I tried modifying the code a little bit, by including the following lines of code,

function [W,M,V] = Init_EM(X,k)
[n,d] = size(X);
[Ci,C] = kmeans(X,k,'Start','cluster', ...
'Maxiter',100, ...
'EmptyAction','drop', ...
'Display','off'); % Ci(nx1) - cluster indeices; C(k,d) - cluster centroid (i.e. mean)
while sum(isnan(C))>0,
[Ci,C] = kmeans(X,k,'Start','cluster', ...
'Maxiter',100, ...
'EmptyAction','drop', ...
'Display','off');
end
M = C';

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Use this routine if the cluster matrix is of variable length and contains
% Scan the cluster index matrix Ci for NaN values, and remove them
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
temp1=Ci(:,1); % Scan the entire matrix column
temp2=isnan(temp1); % Find the rows with NaN values
index_active=find(temp2~=1); % Find the rows with actual values
Ci_new=temp1(index_active); %#ok<FNDSB>
[r,c]=size(Ci_new); % Determine the size of the newly scanned matrix.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Vp = repmat(struct('count',0,'X',zeros(r,c)),1,k);
for i=1:r % Separate cluster points
Vp(Ci_new(i)).count = Vp(Ci_new(i)).count + 1;
Vp(Ci_new(i)).X(Vp(Ci_new(i)).count,:) = X(i,:);
end

V = zeros(d,d,k);
for i=1:k,
W (i) = Vp(i).count/r;
% Find the covariance ignoring the NaNs
V(:,:,i) = nancov(Vp(i).X(1:Vp(i).count,:));
end
%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Init_EM %%%%
%%%%%%%%%%%%%%%%%%%%%%%%

I worked, giving me the estimates. But still I am getting some warnings with "Empty rows of data".

25 Jan 2010 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Roy

Fantastic piece of work !!

I understand EM method can be used for treatment of missing data. please suggest how to use this function for filling the missing dataset. thanks.

07 Jul 2009 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Yashodhan Athavale

I have a multidimensional file in Microsoft Excel. I imported the sheet into X, and executed [W,M,V,L] = EM_GM(X,3,[],[],1,[]);

I am getting the following errors,

[W,M,V,L] = EM_GM(X,1,[],[],1,[])
Warning: Ignoring rows of X with missing data.
> In kmeans at 127
In EM_GM>Init_EM at 261
In EM_GM at 74
??? Attempted to access C(1,2); index out of bounds because numel(C)=1.

Error in ==> kmeans>distfun at 722
D(:,i) = D(:,i) + (X(:,j) - C(i,j)).^2;

Error in ==> kmeans at 329
D = distfun(X, C, distance, 0);

Error in ==> EM_GM>Init_EM at 261
[Ci,C] = kmeans(X,k,'Start','cluster', ...

Error in ==> EM_GM at 74
[W,M,V] = Init_EM(X,k); L = 0;

Please Help....!!! thanks in advance..... :)

06 Mar 2009 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui gd j,

it's very helpful thanks

05 Jan 2009 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Adel Fakih

it might be also any other implementation of netlab. type edit kmeans to find the culprit one.

29 Dec 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui Shawn_lj Qingsheng

When I run [W,M,V,L] = EM_GM(X,3,[],[],1,[]); on matlab command as the example, the error happened, as below:

??? Error using ==> kmeans.kmeans
Too many input arguments.

Error in ==> EM_GM>Init_EM at 305
[Ci,C] = kmeans(X,k,'Start','cluster', ...

Error in ==> EM_GM at 63
[W,M,V] = Init_EM(X,k); L = 0;

Please help,thanks

16 Oct 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

this is good.

31 Aug 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

we can speed up this piece of code by preallocating the memory of W

in init_EM

%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_Init %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [W,M,V] = Init_EM(X,k)
[n,d] = size(X);
[Ci,C] = kmeans(X,k,'Start','cluster', ...
'Maxiter',100, ...
'EmptyAction','drop', ...
'Display','off'); % Ci(nx1) - cluster indeices; C(k,d) - cluster centroid (i.e. mean)
while sum(isnan(C))>0,
[Ci,C] = kmeans(X,k,'Start','cluster', ...
'Maxiter',100, ...
'EmptyAction','drop', ...
'Display','off');
end
M = C';
Vp = repmat(struct('count',0,'X',zeros(n,d)),1,k);
for i=1:n, % Separate cluster points
Vp(Ci(i)).count = Vp(Ci(i)).count + 1;
Vp(Ci(i)).X(Vp(Ci(i)).count,:) = X(i,:);
end
V = zeros(d,d,k);

% Preallocating W
W = zeros(1,k);

for i=1:k,
W(i) = Vp(i).count/n;
V(:,:,i) = cov(Vp(i).X(1:Vp(i).count,:));
end
%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Init_EM %%%%
%%%%%%%%%%%%%%%%%%%%%%%%

29 May 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

This code computes the E-step and the M-step correctly. It is also readable and neat for the user. Nevertheless, there is a problem and an improvement to make I would like to report. The problem is that, the likelihood calculation is incorrect although the outputs of both E and M steps are correct. I have checked the reference to which the code is attributed. In fact, in that reference, the probability density function (pdf) is a single gaussian rather than a mixture. This is why there is a confusion leading to miscalculations. The actual likelihood function should be computed as [in LaTeX format]:

L = \sum^{n}_{i=1} \log(\sum^{k}_{j=1}W(j)*\mathcal{N}(X(:,i)^{T},M(:,j),V(:,:,j)))

The improvement is that there are some loops which can be replaced by matrix multiplications or some other matrix-related operations. Furthermore, the likelihood computation can easily be achieved inside the E-step.

In conclusion, the following code is the one modified with respect to the aforementioned points:

function [W,M,V,L,E] = EM_GM(X,k,ltol,maxiter,pflag,Init)
% Inputs:
% X(n,d) - input data, n=number of observations, d=dimension of variable
% k - maximum number of Gaussian components allowed
% ltol - percentage of the log likelihood difference between 2 iterations ([] for none)
% maxiter - maximum number of iteration allowed ([] for none)
% pflag - 1 for plotting GM for 1D or 2D cases only, 0 otherwise ([] for none)
% Init - structure of initial W, M, V: Init.W, Init.M, Init.V ([] for none)
%
% Ouputs:
% W(1,k) - estimated weights of GM
% M(d,k) - estimated mean vectors of GM
% V(d,d,k) - estimated covariance matrices of GM
% L - log likelihood of estimates
%

%%%% Validate inputs %%%%
if nargin <= 1,
disp('EM_GM must have at least 2 inputs: X,k!/n')
return
elseif nargin == 2,
ltol = 0.1; maxiter = 1000; pflag = 0; Init = [];
err_X = Verify_X(X);
err_k = Verify_k(k);
if err_X | err_k, return; end
elseif nargin == 3,
maxiter = 1000; pflag = 0; Init = [];
err_X = Verify_X(X);
err_k = Verify_k(k);
[ltol,err_ltol] = Verify_ltol(ltol);
if err_X | err_k | err_ltol, return; end
elseif nargin == 4,
pflag = 0; Init = [];
err_X = Verify_X(X);
err_k = Verify_k(k);
[ltol,err_ltol] = Verify_ltol(ltol);
[maxiter,err_maxiter] = Verify_maxiter(maxiter);
if err_X | err_k | err_ltol | err_maxiter, return; end
elseif nargin == 5,
Init = [];
err_X = Verify_X(X);
err_k = Verify_k(k);
[ltol,err_ltol] = Verify_ltol(ltol);
[maxiter,err_maxiter] = Verify_maxiter(maxiter);
[pflag,err_pflag] = Verify_pflag(pflag);
if err_X | err_k | err_ltol | err_maxiter | err_pflag, return; end
elseif nargin == 6,
err_X = Verify_X(X);
err_k = Verify_k(k);
[ltol,err_ltol] = Verify_ltol(ltol);
[maxiter,err_maxiter] = Verify_maxiter(maxiter);
[pflag,err_pflag] = Verify_pflag(pflag);
[Init,err_Init]=Verify_Init(Init);
if err_X | err_k | err_ltol | err_maxiter | err_pflag | err_Init, return; end
else
disp('EM_GM must have 2 to 6 inputs!');
return
end

%%%% Initialize W, M, V,L %%%%
t = cputime;
if isempty(Init),
[W,M,V] = Init_EM(X,k); L = 0;
else
W = Init.W;
M = Init.M;
V = Init.V;
end
[E,Ln] = Expectation(X,k,W,M,V); % Initialize log likelihood and compute expectation
Lo = 2*Ln;

%%%% EM algorithm %%%%
niter = 0;
while (abs(100*(Ln-Lo)/Lo)>ltol) & (niter<=maxiter),
%**************************************************************************
% For the first loop, expectation is computed above (cf. Line 69). For the
% next loops, it is computed in their preceding loops. Therefore, the
% following line [Line 80] is omitted.
%**************************************************************************
% E = Expectation(X,k,W,M,V); % E-step

[W,M,V] = Maximization(X,k,E); % M-step
Lo = Ln;
[E,Ln] = Expectation(X,k,W,M,V); % Log-likelihood and expectation computation
niter = niter + 1;
LnValues(niter) = Ln;
disp(['Iteration # = ' num2str(niter) ', Ln = ' num2str(Ln) ', Stopping Criterion = ' num2str(abs(100*(Ln-Lo)/Lo)) ', Tolerance = ' num2str(ltol)])
end
L = Ln;
theTime = num2str(fix(clock));
theTime(theTime==' ') = '_';
save(['EM_CostFunction_' theTime '_MoreLoopsRemoved.mat'],'LnValues')

%%%% Plot 1D or 2D %%%%
if pflag==1,
[n,d] = size(X);
if d>2,
disp('Can only plot 1 or 2 dimensional applications!/n');
else
Plot_GM(X,k,W,M,V);
end
elapsed_time = sprintf('CPU time used for EM_GM: %5.2fs',cputime-t);
disp(elapsed_time);
disp(sprintf('Number of iterations: %d',niter-1));
end
%%%%%%%%%%%%%%%%%%%%%%
%%%% End of EM_GM %%%%
%%%%%%%%%%%%%%%%%%%%%%

function [E,L] = Expectation(X,k,W,M,V)
[n,d] = size(X);
a = (2*pi)^(0.5*d);
S = zeros(1,k);
iV = zeros(d,d,k);
for j=1:k,
if V(:,:,j)==zeros(d,d), V(:,:,j)=ones(d,d)*eps; end
S(j) = sqrt(det(V(:,:,j)));
iV(:,:,j) = inv(V(:,:,j));
end
%*************************************************************************
%*************************** MODIFICATION: LOOP REMOVAL *****************
%*************************************************************************
E = zeros(n,k);
% for i=1:n,
% for j=1:k,
% dXM = X(i,:)'-M(:,j);
% pl = exp(-0.5*dXM'*iV(:,:,j)*dXM)/(a*S(j));
% E(i,j) = W(j)*pl;
% end
% E(i,:) = E(i,:)/sum(E(i,:));
% end
chunkSize = 1000;
howManyFullChunks = fix(n/chunkSize);
numberOfRemainingVectors = n - howManyFullChunks*chunkSize;
pdfValueVector = zeros(n,1);
for j = 1:k
for chunkCounter = 1:howManyFullChunks
modificationRange = (chunkCounter-1)*chunkSize+1:chunkCounter*chunkSize;
meanSubtractedChunk = X(modificationRange,:)'-repmat(M(:,j),1,chunkSize);
pdfValueVectorToExp = diag(-0.5*meanSubtractedChunk'*iV(:,:,j)*meanSubtractedChunk);
pdfValueVector(modificationRange) = exp(pdfValueVectorToExp)/(a*S(j));
end
modificationRange = howManyFullChunks*chunkSize+1:n;
meanSubtractedChunk = X(modificationRange,:)'-repmat(M(:,j),1,length(modificationRange));
pdfValueVectorToExp = diag(-0.5*meanSubtractedChunk'*iV(:,:,j)*meanSubtractedChunk);
pdfValueVector(modificationRange) = exp(pdfValueVectorToExp)/(a*S(j));
E(:,j) = W(j)*pdfValueVector';
end
sumInColumnDirection = sum(E,2);
divisor = repmat(sumInColumnDirection,1,k);
L = sum(log(sum(E,2)));
E = E./divisor;
%**************************************************************************
%*************************** MODIFICATION ENDS HERE ***********************
%**************************************************************************

%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Expectation %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [W,M,V] = Maximization(X,k,E)
[n,d] = size(X);
V = zeros(d,d,k);
%*************************************************************************
%*************************** MODIFICATION: LOOP REMOVAL *****************
%*************************************************************************
% W = zeros(1,k); M = zeros(d,k);
% for i=1:k, % Compute weights
% for j=1:n,
% W(i) = W(i) + E(j,i);
% M(:,i) = M(:,i) + E(j,i)*X(j,:)';
% end
% M(:,i) = M(:,i)/W(i);
% end
W = sum(E,1);
M = X'*E;
M = M./repmat(W,d,1);
%**************************************************************************
%*************************** MODIFICATION ENDS HERE ***********************
%**************************************************************************
for i=1:k,
for j=1:n,
dXM = X(j,:)'-M(:,i);
V(:,:,i) = V(:,:,i) + E(j,i)*dXM*dXM';
end
V(:,:,i) = V(:,:,i)/W(i);
end
W = W/n;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Maximization %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%**************************************************************************
%************************ LIKELIHOOD REMOVED COMPLETELY *******************
%**************************************************************************
% function L = Likelihood(X,k,W,M,V)
% Compute L based on K. V. Mardia, "Multivariate Analysis", Academic Press, 1979, PP. 96-97
% % to enchance computational speed
% [n,d] = size(X);
% U = mean(X)';
% S = cov(X);
% L = 0;
% for i=1:k,
% iV = inv(V(:,:,i));
% L = L + W(i)*(-0.5*n*log(det(2*pi*V(:,:,i))) ...
% -0.5*(n-1)*(trace(iV*S)+(U-M(:,i))'*iV*(U-M(:,i))));
% end
%**************************************************************************
%*************************** MODIFICATION ENDS HERE ***********************
%**************************************************************************
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Likelihood %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%

function err_X = Verify_X(X)
err_X = 1;
[n,d] = size(X);
if n<d,
disp('Input data must be n x d!/n');
return
end
err_X = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_X %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%

function err_k = Verify_k(k)
err_k = 1;
if ~isnumeric(k) | ~isreal(k) | k<1,
disp('k must be a real integer >= 1!/n');
return
end
err_k = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_k %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%

function [ltol,err_ltol] = Verify_ltol(ltol)
err_ltol = 1;
if isempty(ltol),
ltol = 1e-2;
elseif ~isreal(ltol) | ltol<=0,
disp('ltol must be a positive real number!');
return
end
err_ltol = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_ltol %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [maxiter,err_maxiter] = Verify_maxiter(maxiter)
err_maxiter = 1;
if isempty(maxiter),
maxiter = 1000;
elseif ~isreal(maxiter) | maxiter<=0,
disp('ltol must be a positive real number!');
return
end
err_maxiter = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_maxiter %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [pflag,err_pflag] = Verify_pflag(pflag)
err_pflag = 1;
if isempty(pflag),
pflag = 0;
elseif pflag~=0 & pflag~=1,
disp('Plot flag must be either 0 or 1!/n');
return
end
err_pflag = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_pflag %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [Init,err_Init] = Verify_Init(Init)
err_Init = 1;
if isempty(Init),
% Do nothing;
elseif isstruct(Init),
[Wd,Wk] = size(Init.W);
[Md,Mk] = size(Init.M);
[Vd1,Vd2,Vk] = size(Init.V);
if Wk~=Mk | Wk~=Vk | Mk~=Vk,
disp('k in Init.W(1,k), Init.M(d,k) and Init.V(d,d,k) must equal!/n')
return
end
if Md~=Vd1 | Md~=Vd2 | Vd1~=Vd2,
disp('d in Init.W(1,k), Init.M(d,k) and Init.V(d,d,k) must equal!/n')
return
end
else
disp('Init must be a structure: W(1,k), M(d,k), V(d,d,k) or []!');
return
end
err_Init = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Verify_Init %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [W,M,V] = Init_EM(X,k)
[n,d] = size(X);
[Ci,C] = kmeans(X,k,'Start','cluster', ...
'Maxiter',100, ...
'EmptyAction','drop', ...
'Display','off'); % Ci(nx1) - cluster indeices; C(k,d) - cluster centroid (i.e. mean)
while sum(isnan(C))>0,
[Ci,C] = kmeans(X,k,'Start','cluster', ...
'Maxiter',100, ...
'EmptyAction','drop', ...
'Display','off');
end
M = C';
Vp = repmat(struct('count',0,'X',zeros(n,d)),1,k);
for i=1:n, % Separate cluster points
Vp(Ci(i)).count = Vp(Ci(i)).count + 1;
Vp(Ci(i)).X(Vp(Ci(i)).count,:) = X(i,:);
end
V = zeros(d,d,k);
for i=1:k,
W(i) = Vp(i).count/n;
V(:,:,i) = cov(Vp(i).X(1:Vp(i).count,:));
end
%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Init_EM %%%%
%%%%%%%%%%%%%%%%%%%%%%%%

function Plot_GM(X,k,W,M,V)
[n,d] = size(X);
if d>2,
disp('Can only plot 1 or 2 dimensional applications!/n');
return
end
S = zeros(d,k);
R1 = zeros(d,k);
R2 = zeros(d,k);
for i=1:k, % Determine plot range as 4 x standard deviations
S(:,i) = sqrt(diag(V(:,:,i)));
R1(:,i) = M(:,i)-4*S(:,i);
R2(:,i) = M(:,i)+4*S(:,i);
end
Rmin = min(min(R1));
Rmax = max(max(R2));
R = [Rmin:0.001*(Rmax-Rmin):Rmax];
% clf, hold on
figure; hold on
if d==1,
Q = zeros(size(R));
for i=1:k,
P = W(i)*normpdf(R,M(:,i),sqrt(V(:,:,i)));
Q = Q + P;
plot(R,P,'r-'); grid on,
end
plot(R,Q,'k-');
xlabel('X');
ylabel('Probability density');
else % d==2
% plot(X(:,1),X(:,2),'r.');
plot(X(1:200,1),X(1:200,2),'r.');
plot(X(201:400,1),X(201:400,2),'g.');
plot(X(401:600,1),X(401:600,2),'b.');
for i=1:k,
Plot_Std_Ellipse(M(:,i),V(:,:,i));
end
xlabel('1^{st} dimension');
ylabel('2^{nd} dimension');
axis([Rmin Rmax Rmin Rmax])
end
title('Gaussian Mixture estimated by EM');
%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Plot_GM %%%%
%%%%%%%%%%%%%%%%%%%%%%%%

function Plot_Std_Ellipse(M,V)
[Ev,D] = eig(V);
d = length(M);
if V(:,:)==zeros(d,d),
V(:,:) = ones(d,d)*eps;
end
iV = inv(V);
% Find the larger projection
P = [1,0;0,0]; % X-axis projection operator
P1 = P * 2*sqrt(D(1,1)) * Ev(:,1);
P2 = P * 2*sqrt(D(2,2)) * Ev(:,2);
if abs(P1(1)) >= abs(P2(1)),
Plen = P1(1);
else
Plen = P2(1);
end
count = 1;
step = 0.001*Plen;
Contour1 = zeros(2001,2);
Contour2 = zeros(2001,2);
for x = -Plen:step:Plen,
a = iV(2,2);
b = x * (iV(1,2)+iV(2,1));
c = (x^2) * iV(1,1) - 1;
Root1 = (-b + sqrt(b^2 - 4*a*c))/(2*a);
Root2 = (-b - sqrt(b^2 - 4*a*c))/(2*a);
if isreal(Root1),
Contour1(count,:) = [x,Root1] + M';
Contour2(count,:) = [x,Root2] + M';
count = count + 1;
end
end
Contour1 = Contour1(1:count-1,:);
Contour2 = [Contour1(1,:);Contour2(1:count-1,:);Contour1(count-1,:)];
plot(M(1),M(2),'k+');
plot(Contour1(:,1),Contour1(:,2),'k-');
plot(Contour2(:,1),Contour2(:,2),'k-');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End of Plot_Std_Ellipse %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

20 Apr 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Seems like the log likelihood sometime decreases, which should not happen. Perhaps the formula for computing the log likelihood is incorrect?!

29 Jan 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Thanks for this well written piece of code

23 Jan 2008 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

is there any way i could find where the gaussians fit in the curve and not have the mean on the x-axis

07 Dec 2007 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Thank you!

06 Nov 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Nice code. Its very easy to use - no variations required and the usuage of the function given makes the whole process of integration of the function simpler. Thank you.

30 Oct 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

10 Oct 2007 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Noticed something that may be incorrect. When using multiple mixtures, you cannot compute the log likelihood of all of the data under each mixture and then just sum them. You need to do the sum before going into the log domain. Or use an approximation: http://nlp.cs.byu.edu/mediawiki/index.php/Log_Domain_Computations

10 Oct 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

here's an approximation:
http://nlp.cs.byu.edu/mediawiki/index.php/Log_Domain_Computations

10 Oct 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Noticed something that may be incorrect. When using multiple mixtures, you cannot compute the log likelihood of all of the data under each mixture and then sum them. You need to do the sum before going into the log domain.

07 Jun 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

I found some error prones : the code can't deal when covariance matrix becomes singular.

06 Jun 2007 Plot_GM Generate ellipse and landscape plots of Gaussian mixture. Author: Patrick Tsui

I use your code in Bayes Classifier.
With my data, the Log Likelihood is NaN and Covariance matrix is singular --> det(Cov) = 0
So I can't process futher.
Please show me how to deal this problem
Thank you

05 Jun 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Super great! can't be better!

17 May 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Thank you... well documented, easy to use and flexible interface

15 May 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Thanks to author of this code! it was very helpful in understanding the EM process.

12 May 2007 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

20 Apr 2007 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

This implementation, although very fast and useful, unfortunately doesn't detect and prevent the collapse of a mixture component, resulting in divisions by zero, if the number of initial components selected is suboptimal.

05 Apr 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

The algorithm converts very well to a good solution and is reasonable stable if applied multiple times to the same data. It is even faster than the greedy EM algorithm. My compliments

19 Mar 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Excellent script, well documented and easy to use.

15 Mar 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

06 Mar 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

really great

01 Mar 2007 Fast EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

15 Feb 2007 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

I tried the one from the classification toolbox mentioned in the Duda, Hart, and Stork text. It was impossible to figure out.

This one worked right off. Thanks.

08 Dec 2006 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

First, thanks for sharing the code with the community.

The Likelihood function implemented is problematic. The short cut to compute the likelihood function of a Gaussian mixture model is not correct even though it might work well in many cases...

31 Aug 2006 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Mr. Tsui's implementation does indeed resolve several errors found in several implementations that I have reviewed.

08 Aug 2006 Plot_GM Generate ellipse and landscape plots of Gaussian mixture. Author: Patrick Tsui

Works great!
I wanted to display the ellipses superimposed on a 2D image, so just modifed the code a bit to pass a figure handle as input parameter. Also, having the color of the ellipse as an optional input parameter will be useful.

08 May 2006 EM_GM An expectation maximization algorithm for learning a multi-dimensional Gaussian mixture. Author: Patrick Tsui

Outstanding. Works just as expected.

Contact us