Expanding Sample Covariance Matrix

11 views (last 30 days)
Lemar DeSalis
Lemar DeSalis on 21 Aug 2011
Hello!
I need to calculate the mean vector and the covariance matrix for sampled data. E.g. I have matrix with NumFeatures colums and NumSamples rows. I can then easily use "mean(MyMatrix)" and "cov(MyMatrix)".
However, what should I do if I want to extend the covariance matrix I got through the method described above?
So I have a covariance matrix calculated from the old samples, how can I add the influence of the new samples?
Is there an ease MATLAB-way to do that?
Thanks in advance!
  1 Comment
Oleg Komarov
Oleg Komarov on 21 Aug 2011
The terminology you're using is not clear. Could you give an example.
For reference: http://www.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer

Sign in to comment.

Answers (2)

Lemar DeSalis
Lemar DeSalis on 22 Aug 2011
% MyMatrix is a Matrix containing samples, in this case random data:
MyMatrix = rand( [NumSamples NumFeatures] );
% I need the mean vector and the covariance matrix:
MyMean = mean(MyMatrix);
MyCov = cov(MyMatrix);
% Now I got some new data:
MyLargerMatrix = vertcat(MyMatrix, SomeNewData);
% Calculate new values:
MyMean_New1 = mean(MyLargerMatrix)
MyCov_New1 = cov(MyLargerMatrix);
%%%%HERE IS MY QUESTION:
% But what to do, when the old data is not available anymore?
clear MyLargerMatrix, MyMatrix;
MyCov_New2 = ... ?
% How to update the covariance matrix, if you only have the old
% covariance matrix "MyMean", the number of old samples "NumSamples"
% and the new samples "SomeNewData"?
%
% MyCov_New2 should be identical to MyCov_New1, but MyCov_New2
% should be computed WITHOUT access to the old data.
% For the mean vector, this is easily possible, but how to do so for the covariance matrix?

Oleg Komarov
Oleg Komarov on 22 Aug 2011
% Example inputs
A = rand(100,2);
B = randn(20,2);
C = [A;B];
% Sample covariances (normalized by N-1)
c1 = cov(A);
c2 = cov(B);
c3 = cov(C);
% Means
m1 = mean(A);
m2 = mean(B);
m3 = mean(C);
% Number of samples
nA = size(A,1);
nB = size(B,1);
nC = nA + nB;
% The question is: how to get c3 having only c1, c2, m1, m2?
% Keep in mind that:
  • cov(x,y) = E(xy) - E(x)E(y)
  • m3 = (m1*nA + m2*nB)/nC
  • same with E(xy)
  • cov is the sample covariance, thus we have to adjust for N-1
  • the following formula is valid for covariance only for covariance
ExEy12 = prod((m1*nA + m2*nB)/nC);
adj = nC/(nC-1);
(c1*(nA-1) + c2*(nB-1) + prod(m1)*nA + prod(m2)*nB)/nC*adj - ExEy12 * adj
c3
How to derive the variance is up to you. But you really just need paper and pencil.
  1 Comment
Lemar DeSalis
Lemar DeSalis on 23 Aug 2011
Thanks, I was able to find a solution based on your code!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!