version 1.16.0.1 (4.71 KB) by
Mo Chen

EM algorithm for Gaussian mixture.

This package fits Gaussian mixture model (GMM) by expectation maximization (EM) algorithm.It works on data set of arbitrary dimensions.

Several techniques are applied to improve numerical stability, such as computing probability in logarithm domain to avoid float number underflow which often occurs when computing probability of high dimensional data.

The code is also carefully tuned to be efficient by utilizing vertorization and matrix factorization.

This algorithm is widely used. The detail can be found in the great textbook "Pattern Recognition and Machine Learning" or the wiki page

http://en.wikipedia.org/wiki/Expectation-maximization_algorithm

This function is robust and efficient yet the code structure is organized so that it is easy to read. Please try following code for a demo:

close all; clear;

d = 2;

k = 3;

n = 500;

[X,label] = mixGaussRnd(d,k,n);

plotClass(X,label);

m = floor(n/2);

X1 = X(:,1:m);

X2 = X(:,(m+1):end);

% train

[z1,model,llh] = mixGaussEm(X1,k);

figure;

plot(llh);

figure;

plotClass(X1,z1);

% predict

z2 = mixGaussPred(X2,model);

figure;

plotClass(X2,z2);

Besides using EM to fit GMM, I highly recommend you to try another submission of mine: Variational Bayesian Inference for Gaussian Mixture Model

(http://www.mathworks.com/matlabcentral/fileexchange/35362-variational-bayesian-inference-for-gaussian-mixture-model) which performs Bayesian inference on GMM. It has the advantage that the number of mixture components can be automatically identified by the algorithm.

Upon request, I also provide a prediction function for out-of-sample inference.

This function is now a part of the PRML toolbox (http://www.mathworks.com/matlabcentral/fileexchange/55826-pattern-recognition-and-machine-learning-toolbox)

For anyone who wonders how to finish his homework, DONT send email to me.

Mo Chen (2021). EM Algorithm for Gaussian Mixture Model (EM GMM) (https://www.mathworks.com/matlabcentral/fileexchange/26184-em-algorithm-for-gaussian-mixture-model-em-gmm), MATLAB Central File Exchange. Retrieved .

Created with
R2009b

Compatible with any release

**Inspired by:**
Variational Bayesian Inference for Gaussian Mixture Model, Pattern Recognition and Machine Learning Toolbox

**Inspired:**
GMMVb_SB(X), Gaussian mixture model parameter estimation with prior hyper parameters, Dirichlet Process Gaussian Mixture Model, Variational Bayesian Inference for Gaussian Mixture Model, EM algorithm for Gaussian mixture model with background noise

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Jan HeineThere is a mistake for calculating the prediction in this code: the inputs of your function "mixGaussPred" are swapped. First comes the model, then X2.

SamRecep Doga SiyliJ. Alex Leeoops my mistake...i don't know how to delete the comment, sorry...

tying xiaoQingYang DaiThe code get an unsatisfied result. Maybe the dataset should be responsible.

liangku guoWELL

Wenwen ZhouChaocrixusmay I know how is the mu generated?

mu(:,i) = gaussRnd(mu0,beta0*Sigma(:,:,i))

is it randomly generated to create some sample data?

polo sepianYI-CHEN CHANGhi, I have a 240*320 matrix, I want to know how to input it?

Md Mehedi Hassanhi! i have 20x2 matrix data. how do i utilize this code and stop generating random data?

James TooDhruv ShahBahmanchengjuan gongYi WangThanks for your good work. It is very useful for me.

xiaoli liuDoes anyone know that why the loglikelihood is llh and the function of T?

T = logsumexp(R,2);

llh = sum(T)/n; % loglikelihood

R = exp(bsxfun(@minus,R,T));

Thank you so much~

xiaoli liuSorry! My interface showed 5 stars yesterday. My network was wrong yesterday.

xiaoli liuMatt Jxiaoli liu, why are you giving such a poor, 1-star rating to code that you are thankful for???

xiaoli liuTo Mo Chen

Thanks for your great work. While there are some codes I don't understand.

T = logsumexp(R,2);

llh = sum(T)/n; % loglikelihood

R = exp(bsxfun(@minus,R,T));

Especially the loglikelihood and the function of T.

Thank you so much !

zhuang yuGreat work Mo chen. It saves me lots of time

Wei-Jie ChenGood work thanks！

I have a question, what is the following function? Is there some reference? I could not understand why to do this.

function y = loggausspdf(X, mu, Sigma)

d = size(X,1);

X = bsxfun(@minus,X,mu);

[U,p]= chol(Sigma);

if p ~= 0

error('ERROR: Sigma is not PD.');

end

Q = U'\X;

q = dot(Q,Q,1); % quadratic term (M distance)

c = d*log(2*pi)+2*sum(log(diag(U))); % normalization constant

y = -(c+q)/2;

ERIC LIGurmeet SinghTo Mo Chen

How can we use this code for training GMM from image patches?

Faris Nafiahrania bensaiedMo Chen@ Bernardo Noronha,

the latest update of this function use a new syntax of Matlab R2016b. That's why you get a error due to your Matlab is old. And yeah, replace the logsumexp in this release with a older one should work.

Bernardo NoronhaOk I saw what the error was, the toolbox probably has an outdated version. I replaced it in my toolbox for the one available here. Other than that, looks good :)

Bernardo NoronhaError:

Error using -

Matrix dimensions must agree.

Error in logsumexp (line 14)

s = a+log(sum(exp(X-a),dim)); % TODO: use log1p

Error in mixGaussEm>expectation (line 54)

T = logsumexp(R,2);

Error in mixGaussEm (line 21)

[R, llh(iter)] = expectation(X,model);

Bernardo NoronhaCould you help me out with the error below? Sorry, accidentally submitted it without explaining more.

I downloaded the toolbox and included it in my MATLAB work folder.

Thanks a lot!

Jiyu TianPanpandadCan you help me with the EM for BMM （beta mixture model）? It's my homework,and this weekend is deadline.

Many Thanks!

Manjushree AithalI am facing a huge problem to find gamrnd function. Can you guide me with the same?

Thanks!

lsvihJenniferI am struggling to find the parameter phi in your code, so the variable for prior probability. How is it defined?

Otherwise great work!

Azam Afzaal TahirHello,

Can we apply this algorithm on set of images ?

yy zhuSuwoong HeoSTUTI DEVARSHI[~,x] = histc(r,[0;p/p(end)]);

this statement is showing error how should i get it balanced ....reply me pls

STUTI DEVARSHIi'm getting this error Error: File: G:\dissertation mtech\EmGm\EmGm\mixGaussRnd.m Line: 62 Column: 3

Expression or statement is incorrect--possibly unbalanced (, {, or [. can anybody suggest something

shao shaoAnyone knows how to modify this code to consider weights to the samples. I my case, I have biased samples, so I would like to give weights to each sample in order to fit unbiased model.

Nitisha RajgureHi ... I find this file is very usefull to clear the concepts. Thanks.... Nitisha

Muhammad Javvad ur RehmanPallawi Pallawihttp://www.mathworks.com/matlabcentral/fileexchange/53349-joint-learning-of-multiple-regressors-for-single-image-super-resolution/content/SISR/train/step_2_MoE_model_training/GMM/emgm.m

Pallawi Pallawi@@ najah G can u tell me how to use this code please.

Pallawi PallawiHie sir,

I have an image of 466616*16*1 dimension where I have decided to take my cluster size as 16 .I want to use this code can someone help me on this please.

How to run this code?

What Inputs do I need to give?

CodyDoes anyone know how I could incorporate sample weights into the EM algorithm so that samples with higher weights more heavily influence the EM?

Ankit Singhnajah GCan you please provide an example of initializing using a structure that has mu and sigma. The code looks for this:

if isstruct(init) % initialize with a model

R = expectation(X,init);

I wish to know how to initialize values in this init structure. Thanks so much for this implementation. It really is a blessing

najah Gexcellent

DiogoAnyone knows how to modify this code to consider weights to the samples. I my case, I have biased samples, so I would like to give weights to each sample in order to fit unbiased model.

Matt ChengThx Chen, I add some code for 1-D data by following your code:

Change the code below otherwise in switch structure as

follows:

for i = 1:c

idc = label==i;

plot(X(1,label==i),['.' color(i)],'MarkerSize',15,'LineWidth',1.2);

hold on

end

Hope it is helpful!

Bashar HaddadEasy to be used and efficient Implementation Thx

Arthur CurryIf I had a feature vector with 6 features,the data matrix would be like 6x100 for a 10x10 block image. When trying to use spread,it can't take such dimensional data. Any way to visualize the clusters ?

CKanellasDoes this script work for 1-D cases?

SachinVery easy to implement. Once you get the vector set in d x n, and specify init, algorithm runs flawlessly. Thank you! I expected the output to be better than k-means but for some reason it is not and it may very well be my data set or I may not be using the emgm as it should.

shawinits very nice code

Putu Ananta Wijayamay you give flowchart of this program?

because when i read the code, this is not the same as any algorithm that I read.

TQ

ZhixinRomainAntonmitilmaIs there an exact R version of this implementation? I have found many R implementations of EM for GMM but none of them are as fast as this one.

mitilmaFiziCan you please provide an example of initializing using a structure that has mu and sigma. The code looks for this:

if isstruct(init) % initialize with a model

R = expectation(X,init);

I wish to know how to initialize values in this init structure. Thanks so much for this implementation. It really is a blessing

nurullah atesthanks.but I dont understand some code.

can you answer this What is bellow codes' mathematical mean ?

y = loggausspdf(X, mu, Sigma)

[U,p]= chol(Sigma);

Q = U'\X;

q = dot(Q,Q,1);

c = d*log(2*pi)+2*sum(log(diag(U)));

RahulJudythanks a lot. after finding many materials , finally i find it.

Quan WangIt helps me a lot!

It would be better if you include a "compute_pdf_from_GMM" file, which I have to write myself.

mutahEM for Gaussian mixture: running ...

??? Input argument "X" is undefined.

Error in ==> emalgorithm at 8

R =initialization(X,init);

zjutHi，chen，can I define the number of cluster by myself?

vxxxhi chen

how to see the plot of pdf for this function

Mo ChenHi, cjain, you have function call mu in path. It is your problem to solve, not mine.

cjainhi, i m finding following error:1.Error: File: emgm.m Line: 77 Column: 33

"mu" previously appeared to be used as a function or command,

conflicting with its use here as the name of a variable.

A possible cause of this error is that you forgot to initialize the

variable, or you have initialized it implicitly using load or eval.

2.Input argument "X" is undefined.

Error in ==> emgm_1 at 8

R = initialization(X,init);

plz plz resolve it

cjainPChoppalaHi, thanks for the code; well written.

Can you help me out with a simple query? When we specify the number of Gaussians to (say 2), can we find the weight of each Gaussian component, (i.e weight of all samples that have label=1 and weight of all samples that have label=2)?

Chandra ShekharFantastic code.In fact i am getting following error when i execute in MATLAB 2009a.

??? Error: File: emgm.m Line: 9 Column: 3

Expression or statement is incorrect--possibly unbalanced (, {, or

[.

Please tell me any one how to correct it.

siyamhi is there anyway to set the covariance matrix to diagonal in this code?

chenHi,

I wanna ask what does this eye(d)*1e-6.

You said this is for numerical stability.

Could you explain a little bit?

cyklucifer

hgyfghbubbasYang LiuTom HallFantastic. Does a much better job at fitting than the built-in Signal Processing gaussian mixtures function, which commonly fits an obviously bimodal dist with a unimodal function.

Bayarbadrakh BaramsaiNikolay S.Those missing the Statistics Toolbox and getting an error:

"??? Undefined function or method 'randsample' for input arguments of type 'double'." can use the following code as a substitute for randsample function.

function y = randsample(n, k)

y=round(1+ (n-1)*rand(k, 1) );

Mo (Michael), thanks for the submission, but a few comments I have:

1) You should have mentioned that Statistics Toolbox is needed.

2) when applied following command: label = emgm(x, 10);

where size(x)= 2 84480 , it did not converge in 500 interations, (which took about 2 minutes), as opposed to k_means by Yi Cao, which worked juts fine...

label = emgm(x, 3); worked fine btw...

Alexzalayetomarouane ayechHi Michael,

I want to know wheither there is a theoretical proof for the technique you have used in logsumexp to avoid numerical underflow ?

Bogdan DzyubakSimple to use, fast, and doesn't crash.

CongExcellent Work! Thank you !

fatemehello,I want to apply emgm on adult dataset,which it's attributes are both categorical and numerical,I tried to apply clustering on data saved in dataset and in cell array,but this data types are not defined for emgm. can emgm be used for string array?pleas help me.tnx

keerthihello sir, we are using em algorithm for detecting resampling (tampering of images). for this we need to get the fourier transform of the probability map. how can we modify this code for the above purpose. kindly help.

my mail id : keerthi2412@gmail.com

shamlai need to apply gmm to iris dataset and obtain 3 clusters.i need to display the (index of datapoints)datapoints in each cluster.please help me.

shamlazheng zhouHi Michael,

I have a 65*100matrix,can I use this code to get the two-dimension GMM,in which the mu sigma and weight are two dimension.(z=f(x,y),f is the function for GMM)

Mo ChenFor all the questions about how to use is for image segmentation:

You have to organize the image into a matrix where each column is the feature of a pixel(say rgb)

Adili neilaHi Michael,

how can I use your code for images?

thanks

Prasathwe doing project on statistical pixel intensity segmentation of clsm images..

we need coding for gaussian mixture, normal distribution, poisson...

plss mail coding to tis mail id vinodhinybtech@gmail.com

Mo ChenHi Andreas, that function is in statistics toolbox. It random sample k integers between 1 to n.

AndreasHi Michael,

A small question: the randsample function called at line 44 seemingly does not exist, as I get an error. I am running R2011b. Are other, non-native, files required to run emgm? Thanks.

Mo ChenNicolas:

I dont see problem

Mo ChenFor any one having question about changing result:

Please read wiki page. EM is only finding local minimal, which means the result depend on initialization.

AISSA BOULMERKAThank You for this Excellent Work,

is there any paper that may help to understand the program?

thanks.

MarkFor people getting different results on each run, this is due to the use of psuedorandom number generators in initialization. Try setting the psuedorandom number seed:

http://www.mathworks.com/help/techdoc/ref/rng.html

NicolasHi, I try with 1 D array, and I have this problem

>> label = emgm(a,1);

EM for Gaussian mixture: running ...

Converged in 2 steps.

>> spread(a,label);

??? Error using ==> spread at 33

ERROR: only support data of 2D or 3D.

How I can solve it

thanks

MichaelVery easy to use and fast, but like some of the above posters, I am getting different results every time I run it on the same data.

AmishFixed seed for random generator and got the same plot. This is a very useful utility. Many Thanks.

AmishSame question as Ting:

"converging steps are changing for the same data"

Must have to do with the latest Matlab release. I am using R2011b.

Michael, can you confirm?

AmishHONGJINGTingDear Chen,

When I using EM to analysis my data, the result is always changing, and converge step is changing too, is there any way to make it stable?

Thank you

dattatrayI have a small Question,

suppose i have modeled a data.

it has 100 vectors each having 36 features.

so my input is 36 x 100

now i have given 5 clusters,model is trained now.

now i have set of 5 mu(36 x 1) and 5 sigma(36 x 36 x 5).

lets us say i have a unknown vector x of size(36 x 1)

now how to find out,in which cluster this particular vector fits..?

for each cluster we have only mu,and sigma,and for a single vector matlab gives sigle value of mean,and cov()

can you help me in this..?

if it would have been k-means,its easy to calculate euclidean distance with cluster centroid,which ever is minimum,that is answer..

minni sharmaTo chen

can u send me a code for image fusion using EM algorithm please.

thanking you

Mo ChenTo Venkat R,

This code uses general form ofthe multivariable Gaussian distribution, not the one in your comment, which is simply the 1d special case.

You cannot arbitrayly add a parameter there. You have to ensure the density function is actually a valid density function (means it has to integrate to 1). Otherwise, EM does not work.

Mo ChenTo Brian,

"Furthermore since you draw the centers from the points themselves, there will always be at least 1 point in each cluster, making even the intended code pointless."

If the initialized k centers are very close, after one iteration of the EM, there will be only one cluster there.

This piece of code simply prevent this from happening. It ensure that there is no more than two initialized centers belong to one cluster.

Venkat RDear Chen,

Very good and fast implementation.

I guess the normal distribution you are using is exp( -(x-mu)^2/2*sigma^2 )/sqrt(2*pi*sigma^2)

In that case, if I were to slightly modify the sigma by w*sigma(or mu by w*mu), where 'w' is another design parameter, Can you help me which functions I need to change to utilize your code.

Thanking you very much.

Anathea PepperlBrianFound this pointless piece of code in the initialization:

while k ~= unique(label)

idx = randsample(n,k);

m = X(:,idx);

[~,label] = max(bsxfun(@minus,m'*X,sum(m.^2,1)'/2),[],1);

end

Unless I am missing something, I'm assuming you were trying to make sure at least one point is assigned to each cluster? Well, this just checks if at least 1 point is assigned to the kth cluster. E.g. try:

k=5;

if k ~= unique([5;5;5;5])

disp('Bad!');

else

disp('OK');

end

it will say that label assignment is OK.

Furthermore since you draw the centers from the points themselves, there will always be at least 1 point in each cluster, making even the intended code pointless.

You may want to use another strategy to ensure centers are chosen that take more than a single point for instance.

Another common initialization strategy is to partition the points randomly into k clusters.

Nehak = 1;

X = imread('image.png');

label = emgm(X,3);

spread(X,label);

Please tell me how to fix the errors listed below:

EM for Gaussian mixture: running ...

??? Error using ==> mtimes

Integer data types are not fully supported for this operation.

At least one operand must be a scalar.

Error in ==> emgm>initialization at 46

[dum,label] = max(bsxfun(@minus,m'*X,sum(m.^2,1)'/2),[],1);

Error in ==> emgm at 8

R = initialization(X,init);

Error in ==> Untitled at 2

label = emgm(X,3);

Enrique MartíMo ChenSilvina,

Hi, I will upload a new version. Please try it and tell me if it still happens.

SilvinaDear Michael,

I'm trying to use your code on images (using reshape to give them a vector structure) and I'm getting the following error message:

Error using ==> loggausspdf at 10

ERROR: sigma is not SPD.

Interestingly, after calling the command many times, the function eventually works.

Any feedback on this issue will be greatly appreciated!

Michael DavisA couple of minor bugs:

1. I came across the same problem as Nofil Barlas above when the size of the input vector is [ N 1 ]. Reshape to [ 1 N ] and it works.

2. If you tell it to find only 1 mixture, it keeps going until it runs out of memory. The code should either disallow an init parameter of 1, or else have a short function to handle this trivial case.

Otherwise, great, very useful! Thanks.

Daniel ZoranNevineMichael,

The email address in the file bounced. Please send me your address so that I can email you the data file.

Thanks.

NevineMichael,

I am getting the error:

??? Error using ==> loggausspdf at 10

ERROR: sigma is not SPD.

I am using matlab Release R2010a.

The input data X is 24x57600 with 2 clusters.

labels = emgm(X, 2);

I will send you the data via email.

Thanks.

Giang Lei see now, I have tried with 2009a and earlier version and it gave me error when i increased number of clusters. Work fine with 2009b although it is not converge.

I am very thankful for your reply.

Mo ChenNot happened here. which version of matlab are your using?

Giang LeHi Michael,

Thanks for a quick reply. Here is the problem, I am try to clustering 11208 samples to 128 with dim is 14.

> x = rand(14,11208);

>[est_label,model] = emgm(x,128);

EM for Gaussian mixture: running ... ??? Error using ==> loggausspdf at 7

ERROR: sigma is not SPD.

Error in ==> emgm>expectation at 65

R(:,i) = loggausspdf(X,mu(:,i),Sigma(:,:,i));

Error in ==> emgm at 16

[R, llh(t)] = expectation(X,model);

I have tried to increase the sigma0 but the problem is still there.

Mo ChenGiang Le,

How does it happen? The function can hardly produce a non positive definite sigma.

However, if it does, you may try to change the sigma0 in line 76 to be a larger value.

Giang LeCan you please let me know how to fix the ERROR: sigma is not SPD?

dattatrayThank you Very much sir..!!

Mo ChenHi, Nofil Barlas,

Maybe you forget clear your memory before load the data.

Mo ChenHi dattatray,

Take a look at the comment in the code of

http://www.mathworks.com/matlabcentral/fileexchange/24616-kmeans-clustering

You may get the idea.

Mo ChenHi, Patrick

Sigma(:,:,1) is the covariance matrix of the first gaussian mixture component.

dattatrayCan You please tell me..about initialization which you have made.

generating random values is fine..

but i havent understood

use of maxVal=bsxfun(@minus,m'*X, sum(m.^2,1)'/2 )

[dum,label] = max(bsxfun(@minus,m'*X,sum(m.^2,1)'/2));

can you please tell me that...?

code is really wonderful,

but if i could get any theory material regarding functions you have written,especially for expectation,maximization,and log gaussian pdf..

please mail me on d.dattatray@gmail.com

thank you in advance...

Nofil BarlasQuick question. I ran the code and what the error:

>> load data;

label = emgm(x,3);

scatterd(x,label);

EM for Gaussian mixture: running ... ??? Error using ==> randsample at 117

K must be less than or equal to N for sampling without replacement.

Error in ==> emgm>initialization at 36

idx = randsample(n,k);

Error in ==> emgm at 9

R = initialization(X,init);

Thanks.

PatrickApologize for the following simple question. What exactly does the sigma data mean from the example given? The first Sigma (1,1) is the sigma for the first estimated cluster and the second sigma (2,2) is for the first estimated cluster on the second row ??

Could you please clarify? Thanks.

>> model.Sigma

ans(:,:,1) =

0.7227 0.8439

0.8439 1.8252

ans(:,:,2) =

0.2629 -0.1116

-0.1116 0.2411

ans(:,:,3) =

0.4209 -0.0600

-0.0600 0.0967

Tianfan XUEMo ChenCan you send me your data via email?

Daniil KocharovSorry for asking such a silly question:

I got an error trying to use 1d data.

Error using ==> loggausspdf at 7 ERROR: sigma is not SPD.

Error in ==> emgmm>expectation at 68

R(i,:) = loggausspdf(X,mu(:,i),Sigma(:,:,i));

I think that's because Sigma is:

Sigma(:,:,1) = NaN

Sigma(:,:,2) = NaN

Sigma(:,:,3) = 7.0826e-005

But why is it NaN I cannot understand, or is there anything else wrong?

Thanks.

Mo ChenBefore you give any bad rating, you should really notice that this function require MATLAB 7.9 (2009b).

It use a new feature of matlab.

upgrade your matlab, or you can modify all

[~,x]=fun();

to

[dum,x]=fun();

GayathriProduces the following error with the above steps.

label = emgmm(x,3);

??? Error: File: emgmm.m Line: 21 Column: 7

Expression or statement is incorrect--possibly unbalanced (, {, or [.