File Exchange

image thumbnail

PCA and ICA Package

version 2.1 (381 KB) by

Implements Principal Component Analysis (PCA) and Independent Component Analysis (ICA)

4.73171
57 Ratings

404 Downloads

Updated

View License

This package contains functions that implement Principal Component Analysis (PCA) and Independent Component Analysis (ICA).
PCA and ICA are implemented as functions in this package, and multiple examples are included to demonstrate their use.
In PCA, multi-dimensional data is projected onto the singular vectors corresponding to a few of its largest singular values. Such an operation effectively decomposes the input single into orthogonal components in the directions of largest variance in the data. As a result, PCA is often used in dimensionality reduction applications, where performing PCA yields a low-dimensional representation of data that can be reversed to closely reconstruct the original data.
In ICA, multi-dimensional data is decomposed into components that are maximally independent in an appropriate sense (kurtosis and negentropy, in this package). ICA differs from PCA in that the low-dimensional signals do not necessarily correspond to the directions of maximum variance; rather, the ICA components have maximal statistical independence. In practice, ICA can often uncover disjoint underlying trends in multi-dimensional data.

Comments and Ratings (141)

Brian Moore

Brian Moore (view profile)

@Binh: The variable Znoisy in the first half of the demo_ICA.m code is poorly named- there's really no "noise" at all in the demo; the estimated components still have the same "noise" in them.

The standard Fast ICA algorithm cannot handle noise on it's own- modifications are needed. I haven't implemented these changes myself, but visit the webpage below to see how the ICA experts do it:

http://research.ics.aalto.fi/ica/noisyica/

Binh Nguyen

Dear,
In the code, you added noise before mixing independent sources.
I tried to add noise after the mixing, the code does not work well.
Do you have any suggestion for this problem?
Thank you.

afira fatima

it means that this formula (Zr = T \ W' * Zica + repmat(mu,1,n)) would be used to exactly reconstruct the signals? and pinv(W) is the function for finding inverse??

Brian Moore

Brian Moore (view profile)

@afira: W is r x d. You can only exactly "unmix" the signals when r == d, in which case W is square (and invertible). When r < d, you can't exactly "unmix" the signals, but the best you can do is use pinv(W). Note that W is a special matrix---it has orthonormal rows---so pinv(W) = W'. That's why I wrote

Zr = T \ W' * Zica + repmat(mu,1,n)

instead of the equivalent formula

Zr = inv(T) * pinv(W) * Zica + repmat(mu,1,n)

afira fatima

@brian do we have to take the inverse of W by ourself? if yes then plz guide because W matrix is not a square matrix.

afira fatima

thankyou sir . @brian

Inas Yassine

Brian Moore

Brian Moore (view profile)

@afira: Sorry I renamed myICA to fastICA in a recent update. Please read the documentation for fastICA to learn about the outputs. For example, the documentation tells that

[Zica, W, T, mu] = fastICA(Z,r);
Zr = T \ W' * Zica + repmat(mu,1,n)

is the r-dimensional ICA approximation of your input data.

afira fatima

@brian and how can we reconstruct the original signal form the extracted independant components?

afira fatima

@brian one more question plz . in your previous coments u said that mixing matrix is 'A' ? and i cannot find myICA help. there is no function named myICA instead there is fastICA function are both same??

Brian Moore

Brian Moore (view profile)

@Afira: The mixing matrix is W, so the "unmixing" matrix would be W^(-1). The fastICA code returns

Zica = W * Zcw

where Zica are the ICA components, W is the mixing matrix, and Zcw is the centered and whitened version of the input data

afira fatima

hello brian,
can u please tell which variables you used for mixing and unmixing matrices

Brian Moore

Brian Moore (view profile)

@David: Sorry for the delay. Hopefully you saw that I answered essentially the same question a few questions below. I'll repeat the response here:

ICA has inherent scaling and rotation ambiguity, so the input data has to be whitened (uncorrelated, unit variance) to obtain a unique solution.

However, as per "help myICA", if you do:

[Zica, A, T] = myICA(Z,r);
W = T \ pinv(A);

then W is a matrix such that (up to a constant offset)

Z ≈ W * Zica

As such, you can interpret norm(W(:,k))^2 as the "power" of the kth independent component, because, when norm(W(:,k))^2 is large, Zica(k,:) has a large contribution to Z.

I did a quick search on the topic and found the paper below, which proposes a similar procedure:

http://doc.utwente.nl/64279/1/27_17_hendrikse.pdf

Hope this helps,
Brian

Brian Moore

Brian Moore (view profile)

@Jeffrey: Thanks for the tip! Fixed in 2.1

Brian Moore

Brian Moore (view profile)

@Binh: The permute() calls are there so I can perform all the necessary computations without any for loops.

In the code, there are three relevant dimensions: r, d, and n. Some arrays are d x n, some are r x n, and sometimes I need to compute an r x d array by computing the average of n matrices which are themselves outer products of r x 1 and d x 1 vectors.

For example, suppose X is r x n, Y is d x n, then the r x d matrix (say Z) I want can be computed as

Xp = permute(X, [1, 3, 2]); % r x 1 x n
Yp = permute(Y, [3, 1, 2]); % 1 x d x n
Z = mean(bsxfun(@times, Xp, Yp), 3); % r x d

Hope this helps!

Brian Moore

Brian Moore (view profile)

@mas saud: This sounds like a path problem. Make sure the directory where you put kICA.m is on your MATLAB path. Or that you are in that directory when trying to call the function

Binh Nguyen

Dear Mr. Brian,

May I ask you that what is the purpose of doing "Permute" at lines 99 and 111?

Thank you.

Works great, but you may want to consider replacing wavread() function calls with audioread(). Wavread was removed starting in R15b... any version after this may result in an error

mas saud

I am trying to use ICA for image, I first decompose the image to 2D and use
[Zica, W, T, mu] = kICA(g,3);
Undefined function 'kICA' for input arguments of type 'int8'.

Undefined function 'kICA' for input arguments of type 'double'.

any help is apprecaited

chao Lv

@ Brian Moore. Dear Mr.Thanks for your software. It's great!
I tested it but I don't know how to analyse harmonic by it.
Thank you very much!

David Bacci

@ Brian Moore. Dear Mr. Moore really thanks for your software. I have a couple of questions regarding the output. Is there a way to restore the original scaling of the signals?. I mean during the algorithm data is centered and whitened. and the principal components will have all zero mean ans 1 std. No way to restore their original values? The second question regards the variance associated to each principal component. Is there a way to calculate it like in the PCA algorithm? Thanks for your attention. David

David Bacci

@ Brian Moore

Sir I am working on a project which requires separatioN OF RGB CHANNELS using ICA of video images. Being a fresher in this coding feed I need your help to provide me with a suitable code that can work for our project. It would be of much benefit to get it.

Sir I am working on a project which requires separatioN OF RGB CHANNELS using ICA of video images. Being a fresher in this coding feed I need your help to provide me with a suitable code that can work for our project. It would be of much benefit to get it.

Asad Malik

@Brian. Please ignore my previous message. I was looking at the version of this code from 2015. The 2017 version does indeed have Gp = (1 - u^2) exp(-u^2/2). Thanks for your help.

Asad Malik

@Brian: Many thanks for clarifying this. My mistake, I was interpreting g as G and not as the derivative of G. This does lead to another question though:

In your code, Gp=Sk*G. Doesn't this makes it (u^2) exp(-u^2/2), not (1 - u^2) exp(-u^2/2)?

Brian Moore

Brian Moore (view profile)

@Asad: Thanks for your interest in my code. I believe it's correct as written. You are correct that the variable G in my code contains g(w^Tx) from the paper, and the variable Gp contains g'(w^Tx). The use of capital letters in my code (chosen since the variables are matrices) might be confusing because there is also a function G(.) in the paper, which is the antiderivative of g. The Hyvarinen paper recommends setting

G(u) = -exp(-u^2/2)

as a robust approximation of negentropy, and so the actual quantities used in the FastICA algorithm would be

g(u) = G'(u) = u exp(-u^2/2)
g'(u) = G''(y) = (1 - u^2) exp(-u^2/2)

Not sure why using g(u) = -exp(-u^2/2) works better for your data---the choice of G() is a heuristic, so performance may vary in practice. Hope this helps

Asad Malik

Asad Malik

Hi Brian,

Many thanks for sharing this code. It's immensely useful. I have a question:

For the estimation of negentropy, you use G = Sk .* exp(-0.5 * Sk.^2). Based on my current understanding of Hyvarinen & Oja (2000), I think it should be G =-exp(-0.5 * Sk.^2), if G refers to g(w'x) in the paper (page 423). For the test data I'm using, I actually get better results with this as well. Could you please explain if G refers to g(w'x), and if so, why is it computed this way?

Many thanks.

Rene Jaros

afira fatima

sir,
how can i use this fastica algorithm to separate eog signal from eeg

Rajat Singh

Sir,
I am thinking to use this code for EMG date to extract Muscle Synergies. Do I have to just input the data into your code to get a W matrix which will be the synergy of muscles.

sahana kp

@Brian moore
i need to find ica of EHGs wavelet components.how this code can be apply to my signal..response is greatly appreciated......

Minkyu

Minkyu (view profile)

thanks!

Hi, does this code require multiple programs to play at once? May I know which one is for which? And how do I do multiple programs in MATLAB?

nellie

nellie (view profile)

Can this code be applied to 4D data, such as fMRI? thx

Brian Moore

Brian Moore (view profile)

@Pär: Thanks for your interest in my code! ICA has inherent scaling and rotation ambiguity, so the input data has to be whitened (uncorrelated, unit variance) to obtain a unique solution.

However, as per "help myICA", if you do:

[Zica, A, T] = myICA(Z,r);
W = T \ pinv(A);

then W is a matrix such that (up to a constant offset)

Z ≈ W * Zica

As such, you can interpret norm(W(:,k))^2 as the "power" of the kth independent component, because, when norm(W(:,k))^2 is large, Zica(k,:) has a large contribution to Z.

I did a quick search on the topic and found the paper below, which proposes a similar procedure:

http://doc.utwente.nl/64279/1/27_17_hendrikse.pdf

Hope this helps,
Brian

Pär Nyström

Thanks Brian for sharing your work!!! Works excellent on my test data! Just one question; the amplitude for each component is scaled to variance=1, but is it possible to estimate the relative amplitude between components or the absolute amplitude?

Nathan Zhang

Thank you for your excellent work.

Jason

Jason (view profile)

@Brian: I've read the paper and understand the basic concept. I am applying your code to systematically test ICA on non-Gaussian random variables. I thought you might know some limitation off the top of your head. I've mixed uniform, exponential, gamma, Rayleigh, and lognormal random variables. Now that I've spent a little more time on it. Lognormal doesn't seem any more troublesome than anything else. It seems to depend on the parameters of the distributions, which can adjust their similarity to Gaussian. Like you wrote, that's probably the issue.

super

super (view profile)

Brian Moore

Brian Moore (view profile)

@Jason: My code implements the FastICA algorithm from (link below) with Gaussian negentropy. In other words, it produces output components that are as "far from Gaussian" as possible. As you may know, this choice of objective is justified by the Central Limit Theorem -- the more independent signals you mix together, the more Gaussian-like the result -- so decomposing into non-Gaussian components is a reasonable way to reverse the mixing.

From this intuition, I think FastICA is failing because the log-normal distribution is too "Gaussian-like".

It sounds like you're just experimenting, but if you're really interested in recovering Gaussian-like signals, perhaps the Hyvarinen paper or some references therein propose modifications to tackle this.

http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf

Jason

Jason (view profile)

I wrote a script to generate two statistically independent components and mix them them to synthesize a two-channel signal. I used the synthesized signal with your myICA code recently. For most cases, myICA computes components that match those used to synthesize the signal, but not in all cases. For example, when I use a lognormal and uniform distribution for the two components, it fails. For most other cases, it works very well. Can you comment on limitations to your code?

Brian Moore

Brian Moore (view profile)

@Long B: It seems that you're doing the same thing as the previous commentator, JANANI A. That is:

It sounds like you're passing in a 1 x n scalar signal. ICA doesn't work like that. You have to pass in a d x n matrix of signals (plural), and then ICA will separate those into r <= d components

Long B

Long B (view profile)

Hi Brian,

It's my first time to use ICA in Matlab. I used Zica= myICA function to decompose the Matrix which is the signal from the mixture. But it turns out, all the decomposed matrices in the result, they are the same. I'm wondering if there is something wrong with my operation. I'm looking forward your reply. Thanks in advance.

Janani A

Thanks for your reply Sir.

Brian Moore

Brian Moore (view profile)

@JANANI A: It sounds like you're passing in a 1 x n scalar signal. ICA doesn't work like that. You have to pass in a d x n matrix of signals (plural), and then ICA will separate those into r <= d components

Janani A

Sir,

Thanks a lot for your reply.
I am trying to analyse my signal using ICA and want to separate the independent sources present in the signal. According to the algorithm, the input parameters are Z and r where r is the desired number of independent components. If I give values more than 1 for 'r', I'm getting the same values for all the ICs! Suppose r=3, output matrix consist of three rows,the ICs, I find that the 2nd and 3rd row values are the same as the 1st row! Can you please tell me where I am going wrong?

Thanks in advance!

Brian Moore

Brian Moore (view profile)

@JANANI A: Type "help myICA" and read the "Outputs" section to see how to (approximately, when r < d) reconstruct the input signal

Janani A

Dear Brian,

Thanks so much for the package.
I have a question on the reconstruction of the original signal. The output we get are the independent components. How do we reconstruct the signal from the obtained ICs?
Any help in this regard would be greatly appreciated.
Thank You.

Brian Moore

Brian Moore (view profile)

@Mohammed: I see the confusion now: you're thinking of F and D as the eigenvectors and eigenvalues, respectively, of the sample covariance W * W'. For me, U and S are the left-singular vectors and singular values, respectively, of the raw weights W. The two methods are completely equivalent- the eigenvalues of the sample covariance are the *squared* singular values of the raw weights, so there's no square root needed in my formulation.

Here's proof:

% Random weights
W = randn(30,70);

% Method 1
[U, S, ~] = svd(W);
W1 = U * diag(1 ./ diag(S)) * U' * W;

% Method 2
[F, D] = eig(W * W');
W2 = F * diag(1 ./ sqrt(diag(D))) * F' * W;

% Check equivalence
err = norm(W1 - W2) %#ok

@Brian: Thank you very much for your answer.
I think my question was not clear.
My question was about symmetric decorrelation which is based on the following formula:
W=[F*(D^-1/2)*F']*W
So, I was expecting the following command in the code:
W = U * diag(1 ./ sqrt(diag(S))) * U' * W;
instead of this one:
W = U * diag(1 ./ diag(S)) * U' * W;
Thanks again for your helps.
Mohammad

Brian Moore

Brian Moore (view profile)

@mohammad: It's correct as written. Here's proof:

% Random data
W = randn(3,7);

% Whiten
[U, S, ~] = svd(W,'econ');
W = U * diag(1 ./ diag(S)) * U' * W;

% Verify that covariance is identity
covW = W * W' %#ok

Hi Brian,
Thank you very much for uploading the invaluable code.
As you used symmetric decorrelation (Eq. 45), do'nt we need to take root of diagonal matrix?

W = U * diag(1 ./ diag(S)) * U' * W;

Thanks again

Brian Moore

Brian Moore (view profile)

@Vivin: it's up to you to figure out how to import your data into the correct format.

All you have to do is load the data into an m x n matrix containing n (synchronized) samples from each of your m channels and then pass that matrix into myICA.

Dear Brian,

Could you kindly answer this question if possible. Suppose i have the values of the signal in a text file. Let's say it is like an array of integers of a large length. Is it possible that i can apply your algorithm towards this signal.

Also if i have 2 channels of different length of wave files. would i be able to retrieve a signal from the output?

I have attached a copy of my code to retrieve the original signal from a text file and its corresponding histogram. and i am able to retrieve a signal from it.

%%
clear all;
clc;

x=fopen('C:\Users\pc\Documents\Limoges\My Biblio\CoolTerm Capture 2016-05-18 14-52-33.asc','r');
sig=fscanf(x,'%i');
sig=sig*3.3/1024/1100;
freq=9600/16;
time=1/freq;
n=length(sig);
t=(0:(n-1))*time;
f=fft(sig,n);
fff=-freq/2:freq/(n-1):freq/2;
plot(t,sig);
plot(fff,fftshift(f));

Alger

Alger (view profile)

thank you for share

Dear Brian,

Thank you very much for you quick response. YEs i would hope that ECG would be independent from ECG and EMG. But the problem lies is that the are all sometimes mixed together. My main objective is to remove the noise produced during the movement and this noise is highly unpredictable and is caused due to the wires. So if i were to take two a sources from 2 different pairs wires. i would get 3 signals(one from ECG and 2 independant noises induced from each pair of wiring). The goal would be to somehow get the data set to matlab and process it live. But i am unable to find any resources.

Brian Moore

Brian Moore (view profile)

@Vivin: You can certainly run ICA on your multiple channel ECG data and see what the output looks like. I have no background in medical applications, but if the ECG signal is truly independent of the other signals (EEG, EMG) that are presumably mixed into each channel, then you might see a group separation.

Type "help myICA" to see the syntax of my ICA implementation.

For removing 50Hz noise, it's probably best to use a more classical filtering method, right? Something like this:

http://www.mathworks.com/help/dsp/ug/removing-high-frequency-noise-from-an-ecg-signal.html

If you find that ICA works well on a test dataset, you might want to Google "online independent component analysis" to learn about how people modify ICA to handle streaming data.

Note: I have no experience with online ICA, and, in particular, my PCA and ICA Package doesn't support it.

Dear Brian,

I am trying to implement your code on to separate ECG signals from its corresponding noises ( 50HZ line noise, EEG signals, EMG signals, and movement of the body ). Is it possible to use your algorithm to make this separation to filter out the ECG signal alone?

I have another question. Is it possible to apply your code for real time processing. for example, i am collecting ecg data from a patient using a measuring device with some multiple channels, would it be able to separate the sources and give me the individual outputs?

Dear Brian,

I have another question. Is it possible to apply your code for real time processing. for example, i am collecting ecg data from a patient using a measuring device with some multiple channels, would it be able to separate the sources and give me the individual outputs?

Okay, thank you very much for the immediate response. I think I can continue working with your information ;)

Brian Moore

Brian Moore (view profile)

@Dominik: The pre-whitening just multiplies your data by a particular matrix so that the sample covariance of the "whitened" data is the identity matrix. I can't say much without seeing your data, but it must be the case that the sample covariance of your data must be rank-deficient, i.e., has one or more very small eigenvalues.

Make sure you pass in the data as an m x n matrix where m = #signals and n = # samples.

Hi Brian,

I'm about to seperate EMG recordings into its original signals. But the independent components do have a much greater amplitude (e+005) than the mixed signals. I guess this is because of the preprocessing, i.e. the whitening. Can you tell me why the whitening changes the magnitude to such an extent?

Greetings, Dominik

Hi Brain..
Thank you very much for your work and making it available.
I would like to ask you if it is possible to separate instrumental tracks in songs by ICA.And what is the best way to separate vocals and music in audio signal.
Thanks in advance...

Brian Moore

Brian Moore (view profile)

@Andrew: You can certainly use ICA for image analysis, but it sounds like you're embarking on some kind of research project, so I think it's in your best interest to learn how to do it yourself. ICA is a classical technique and there's tons of literature about on the web just a Google search away. If you decide to use my ICA code, feel free to read the function documentation, which will tell you all you need to know about how to use it.

Andrew

Andrew (view profile)

Hi Brain.
I would like to ask you if it is possible to use your code for image analysis.
What i want to do is to analyze a group of medical images, taken at the same position of the body, for different time points.
I 'm trying to Analise these images and to separate the background from the original image. After that i would like to reduce the motion of that part of the body, because of breathing.
Thanks
 

Brian Moore

Brian Moore (view profile)

@Rajkumar: Say you have d "mixed speech" waveforms, each with n samples, and you're trying to recover the r underlying speech waveforms. Put the mixed speech waveforms into a d x n matrix Z and execute the following

Zica = myICA(Z,r);
Zpca = myPCA(Z,r);

Read the documentation of the two functions to see what is returned.

Note: PCA probably won't work for the speech separation problem; this classic problem is often used as motivation for doing ICA.

thanks for your valuable program sir...

how can be the ICA and PCA analysis techniques can be used for speech detection from multiple mixed speech signals please help me sir

thanks in advance sir...

Brian Moore

Brian Moore (view profile)

@Renan: You're welcome to modify and redistribute the code- just follow the rules of the BSD license included in the download.

As for the details of switching kurtosis, I'll leave that to you. To get started, I recommend the following paper:

http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf

Hi Brian.

I want to use the Kurtosis measure, instead of gaussian neg.

First question: Can I change your code? I will not redistribute or anything else, just want to see the difference between the measures
Second question: Can you guide me? I will use the kurtosis function, available on matlab. How can I adjust that?

Thanks for sharing!

Darin McCoy

Hi Brian,

This is probably a really simple request, but Matlab's version of PCA comes in a format

[coeff, score, latent, tsquared, explained, mu] = pca(x)

[Zpca T U mu eigVecs] = myPCA(x,r)

.... how can you get the same results from your version as Matlab's pca?

Brian Moore

Brian Moore (view profile)

@Fransesco: ICA is definitely a good approach to decomposing a signal into independent spectra. The non-zero mean of your Lorentzian component is okay: the first step of the ICA algorithm is to center the data (subtract the mean), so, if it is working correctly, ICA should output a zero-mean version of the Lorentzian with no problem.

As I mentioned to Hamed, my implementation of ICA will fail in the presence of Gaussians because its objective function is to minimize the Gaussianity of the output components. Clearly this won't work when one of the output components IS Gaussian.

Here's a few other (probably better) ICA packages I recommend trying:

http://research.ics.aalto.fi/ica/fastica/code/dlcode.shtml
http://www.cis.hut.fi/projects/ica/fastica/
http://cogsys.imm.dtu.dk/toolbox/ica/

Hi Brian,
thank you very much for your work and making it available.
I am trying to use your routines to anylize the data of some Uv-Vis spectra of a mixture of chemical compounds. I know that in the mixture I have consists of more than 1 compound but less than the number of different spectra I have at different temperature. I have checked the routines against syntetic data I generated modifying demo_ICA using linear combination of gaussian functions, trigoneometric functions and lorentzian functions before using the routines. I have observed that whereas the trigonometric functions are very well identified those of Lorentzian and gaussian are not. I have read in the answers to Hamed that this is because gaussian can be expressed as sum of other gausssians but this is not true with lorentzian. Even with my real data whent I try to analyze them I get negative "spectra" as independent component which is physically impossible. I am not an expert in signal processing and I got the impressin that part of the problem might be the fact that my signals do not have null average as sine and cosine have. Cause I have read that the method has been used also in the study in medicine in EEC and EEG I thought it was a really good method also for analysing superimposition of independent spectra.
Thank you in advance for any advice.
Best regards
Francesco

Brian Moore

Brian Moore (view profile)

@Hamed: The sum of independent Gaussians is again a Gaussian, so your question is ill-posed: there's not a unique decomposition.

Even if only one component was Gaussian, ICA still wouldn't work: its objective function is to *minimize* Gaussianity of the output signals.

Brian Moore

Brian Moore (view profile)

@Yeon-Mo Yang: Sorry for the delay... rng() is a built-in MATLAB function that controls its random number generator. I use it to set the generator seed so the results of myICA() are deterministic, but this isn't necessary. You can comment out the line, its harmless.

rng() was introduced around 2010, so you must have a pre-2010 MATLAB distribution...

Hamed

Hamed (view profile)

Hello Again,

I forgot to mention that that noises are independent.

Hamed

Hamed (view profile)

Hello Brian,

I've a summation of two normal distribution and I was wondering if I decompose it into its two normal components using ICA.
Thank you.

AJ Zadeh

Hi dear Brian And thanks so much for your code !
I loaded tow speech signals with Fs=16000 and n=16000 (1sec) then mixed together using random mixing matrix. I gave mixed signals as input data (Z) and r=2 , d=2 but separated sources was not good at all ! and estimated mixing matrix(A) was not correct too.
I need your help !

zhou zexun

Thanks for your code, it is very helpful for me!

Yeon-Mo Yang

Brian!
What is rng(1) in the first line of the source code demo.
When I tried it failed. I need your help. Thanks!

Brian Moore

Brian Moore (view profile)

@Thomas: The ICA weights have to be decorrelated so they don't converge to the same values. My code implements the FastICA algorithm (reference in multiple previous comments on this package); please refer to the literature for details about the ICA algorithm.

The only restriction on the code is computational complexity: if the code completes, everything should be fine.

In the example you gave, the input to myPCA/myICA needs to be 3 x 300000 matrix: the columns should contain samples

Thomas Sim

Sorry again brian, hope i am not bothering you. May i know why do you need to de-correlate your weight matrix after the 4 steps ICA implementation?

Also just to check are there any restriction for your algorithm to work on large datasets?

Because i tried replacing your MVG samples with a 300000x3 matrices of 3 sound signals in ICA_demo and replaced maxsamples in myICA to my matrices length.

The results of the ica de-mixing is very poor for both with or without pca.

On contrary, the result of your pca is reasonable on large datasets.

Brian Moore

Brian Moore (view profile)

@Thomas: The principal components (PCs) produced by SVD are orthogonal, but they aren't unit norm, so the second whitening step is just normalizing each component to have unit norm (i.e., T is a diagonal matrix).

In many applications, the PC *magnitudes* are important; in those cases, you'll want to comment out the second whitening step.

I chose to normalize the PCs by default to allow direct comparison with ICA, which, by construction, returns unit norm components.

Hope this helps!

Thomas Sim

Hi Brian, hope this isn't a stupid question, mind if i ask for pca, you center and svd the data once but why do you need to whiten it again with svd? It's like doing it twice?

Brian Moore

Brian Moore (view profile)

@zongbao: not possible. A quote from Wikipedia:

"In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals."

The first step of ICA is centering + whitening, which removes magnitude information. Indeed, after running myICA(), you can check that cov(z_ic') == identity matrix

zongbao

How can I find z_ic that corresponds to a specific eigenvalue and eigenvector? I mean I want to find the independent component that corresponds to the maximum eigenvalue. Thank you.

TS Sharma

Thanks Brian!

Brian Moore

Brian Moore (view profile)

@TS Sharma: Yes, the columns of z_ic_new are the ICA coefficients corresponding to each input sample (i.e., columns of z_new)

TS Sharma

Referring to your reply to my previous comment, I think it is z_ic_new.

TS Sharma

Another basic question. May I know which matrix contains ICA coefficients?

TS Sharma

Thanks a lot Brian!! That was very kind of you.

Brian Moore

Brian Moore (view profile)

@TS Sharma: First, call myICA() like this:

[z_ic A T mean_z] = myICA(z,NUM);

Then, if new data is in (m x 1) vector z_new, you can project into ICA space by computing

z_ic_new = A * T * (z_new - mean_z);

Here, (A,T,mean_z) are the ICA transformation matrix, whitening matrix, and mean vector, respectively, of the original data

TS Sharma

Thank you for the code. This is a rather basic question. How do I project new (test) data onto the newly found ICA basis?

can we use this ica package for feature extraction??? kindly do reply.

thanks for the package..!! but how to use for speech signal?? i am too new to this field. that's why i need your guidance .

Bicheng Ying

Thanks for this nice package.

Keven Laboy

That makes perfect sense. Thank you for the quick response!

Brian Moore

Brian Moore (view profile)

@Kevin: The mixing matrix is "A", which is indeed orthogonal. Check out the last line of myICA.m: you'll see the statement

z_ic = A * z_cw;

which computes the independent components (z_ic) from the whitened input data (z_cw).

The matrix "T \ pinv(A)" transforms the independent components back to the original (unwhitened) domain, but it's not, in general, orthogonal because the whitening matrix "T" isn't orthogonal. Hope this helps!

Keven Laboy

Thank you for the package.

One question, to my understanding if you whiten and center the data, the matrix containing the basis functions (mixing matrix) should be orthogonal. When I run your code and get the mixing matrix by doing T \ pinv(A) (as specified in myICA.m) the resulting matrix is not orthogonal. Is this an issue?

Thank you in advance

Marthed

Thanks For providing this package ^_^

M'hamed

How to select the best dimension of this best algorithm ICA ?
Thank's

Peixin

Peixin (view profile)

Brian Moore

Brian Moore (view profile)

@LA_2012: Please read my previous comments for more detail about my implementation of ICA and references to the literature.

To apply ICA to images, you could vectorize each image and store them as columns of the input matrix

LA_2012

Is this fixed point ICA or some other ?
How to use it on images?
How to modify the code according to requirement of some other ICA?
From where can I get reference data to understand ICA in a better way?
Thanks in advance :)
Thanks for providing this code

Guilherme

Thanks for sharing your code Mr. Moore.

I'm using part of your ICA function to implement my own.
I wasn't get good results using the weight vector decorrelation as you did so I implemented the Gram-Schmidt decorrelation as proposed by Hyvarinen pq 15 (http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf).

Here is the code I inserted right after the weight normalization, still inside the for loop:

% Gram-Schmidt
summ = zeros(nOb,1);
for j = 1 : p-1
wj = W(j,:)';
summ = summ + W(p,:)*wj*wj;
end
W(p,:) = W(p,:) - summ';
W(p,:) = W(p,:)/norm(W(p,:));

Body

Body (view profile)

I get it. Thanks a lot!

Brian Moore

Brian Moore (view profile)

@Susanne: The for-loop containing "gp" is implementing steps 2-3 of the algorithm on pg. 14. So, in particular, gp is related to g'(.) from the document

Susanne

Ok I think I got a better grasp on it thank you so much your code helped me a lot. I got on question though, what is the 'gp' you're using in your ICA code?

Brian Moore

Brian Moore (view profile)

@Body: No, the "err" vector keeps track of the how much the weight vectors are changing at each iteration (used only for stopping criterion). If you do as you suggested, you'll just be measuring how close each weight vector is to being unit norm (*spoiler alert* - they're always unit norm)

Brian Moore

Brian Moore (view profile)

@Susanne: I suggest you read the help info and comments in the myICA.m function. If you need more detail, you'll need to refer to the literature. For example I recommend

http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf

FYI: I'm using the 4-step algorithm on pg. 14 along with the symmetric decorrelation step involving the W matrix from (45) on pg. 15

Body

Body (view profile)

Thanks for your code! ICA is so difficult to understand. U help a lot! I have a question. Can
err(i) = 1 - w(i,:) * w(i,:)'
instead of
err(i) = 1 - w(i,:) * w_old(i,:)'?

Susanne

I have some trouble to completely understand the ICA algorithmen zou are using. Would you mind heling me out?

Susanne

Brian Moore

Brian Moore (view profile)

@LionelB: Good point - I'm disappointed in myself for not mentioning bsxfun() - its MATLAB's secret sauce for vectorizing

LionelB

Or even

z0 = bsxfun(@minus,z,sum(z,2)/n);
R = (z0*z0')/(n-1);

:) (Matlab uses this idiom internally, see e.g. the 'cov' function).

Anyhow, I only raised this because I almost gave up on your fine package until I realised the fix was trivial.

Brian Moore

Brian Moore (view profile)

@LionelB: Yes, I'm aware. I wrote the MATLAB code so it could be directly translated into, e.g., C++. If I were maximizing MATLAB efficiency, I would have written

z0 = (1 / (n - 1)) * (z - repmat(mean_z,[1 n]));
R = z0 * z0';

instead of

z0 = z - mean_z*ones(1,n);
R = (z0*z0') / (n-1);

LionelB

Nice package, but the 'myWhiten' routine is a terrible computational bottleneck for large datasets, because of the inefficient sample covariance calculation. This can easily be done without looping: simply de-mean z, e.g.:

z0 = z-mean_z*ones(1,n);

then calculate

R = (z0*z0')/(n-1)

Delivers an orders-of-magnitude speedup.

Ilia

Ilia (view profile)

yes i was talking about z_LD, i was afraid that it could be a too strong restriction for the solution, because I'm not really expert with this kind of analysis, and it is the first time i find the possibility to compute the low dimension z.
Thanks you for the answer.

Brian Moore

Brian Moore (view profile)

@Ilia: Are you referring to "z_LD" as described in the myICA help? It's a matrix of the same size as the input "z" that approximates z from a linear combination of the "NUM" independent components in output "z_ic"

Ilia

Ilia (view profile)

thank you for sharing the great job,I've a question.
I'm using your 'myICA.m' and i would like to understand what ICA_LD really do.
can you suggest some lecture about it or explain in few words,if it is possible.
thanks you very much

Brian Moore

Brian Moore (view profile)

@Prarinya Check out the FastICA algorithm from

http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf

I'm using the 4-step algorithm on pg. 14 along with the symmetric decorrelation step involving the W matrix from (45) on pg. 15

Cheers!

Great contribution. I have a request

Could you please tell us the reference of the algorithm you use? I'm curious about the update rule for creating the transformation matrix 'A' in the code.

I tried to study the ICA algorithm from the ground. However update rule you use are quite different from the document I have. Therefore, I realized that there are more than one approach to update the transformation matrix.

Arjuna

Arjuna (view profile)

better than FastIca ;)

Bosi

Bosi (view profile)

Abdelrahman

super good

Brian Moore

Brian Moore (view profile)

Pierre,

You must be using an old version of MATLAB (<= 2009b I believe) where ~ is not supported. Please replace the line with something like

[U,S,temp] = svd(w,'econ');

The variable temp is not used; the name is arbitrary

Pierre-Pascal

Hi thanks for providing this code! super useful.

line 97 on my ICA returns an error
[U,S,~] = svd(w,'econ');
it seems to hate the ~ as an output

did I miss something?

Shivakumar

Sir, there is an error at line 31. Can you please solve that? thank you for providing this file.

Jim

Jim (view profile)

Wang

Wang (view profile)

Brian Moore

Brian Moore (view profile)

Why the 2 star ranking Eugene? Please provide feedback so I can improve this package!

Eugene

Eugene (view profile)

Updates

2.1

Adding support for audioread() function in loadAudio()

2.0

- Adding support for kurtosis-based Fast ICA
- Adding the max-kurtosis ICA algorithm
- Shiny new ICA demos on source separation (including real audio data)

1.4

Uploading .zip (omitted in last update)

1.3

Improving documentation and code performance

1.2

Updating myPCA() documentation

1.1

Fixing bug in myMultiGaussian(). Needed to use lower triangular Cholesky factorization, not the upper triangular version.

MATLAB Release
MATLAB 7.13 (R2011b)
Acknowledgements

Inspired: EOF

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video