Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
faster way to implement corr

Subject: faster way to implement corr

From: heidi jiang

Date: 16 May, 2012 00:21:07

Message: 1 of 6

Hi!

I'm currently trying to use corr to perform 230,000 x 230,000 correlations. To preserve memory I've been doing this in a for loop (first vector x all 230,000 vectors, second vector x all 230,000...). This works because I just need an average correlation for each iteration of the for loop. However, each iteration of the for loop takes about 30 seconds to perform, so it will take roughly 2000 hours to finish this. Have I overlooked a way to exponentially decrease the time needed to do this processing?

Thanks!
Heidi

Subject: faster way to implement corr

From: Roger Stafford

Date: 16 May, 2012 01:37:07

Message: 2 of 6

"heidi jiang" <heydee@gmail.com> wrote in message <jourtj$quo$1@newscl01ah.mathworks.com>...
> I'm currently trying to use corr to perform 230,000 x 230,000 correlations. To preserve memory I've been doing this in a for loop (first vector x all 230,000 vectors, second vector x all 230,000...). This works because I just need an average correlation for each iteration of the for loop.
- - - - - - - - -
  What exactly do you mean by taking an "average correlation" for each of your iterations? If you mean, for each vector finding the average of its (Pearson?) correlations with all the other vectors, this average would be a very strange kind of "correlation". For example, suppose two vectors are such that one is the negative of the other. Then taking the average of their two respective correlations with any third vector would always give you a zero result. In other words, you could get a lot of cancellation on what ordinarily would be regarded as very strong correlations by such an "averaging".

  There is an obvious way of greatly simplifying your task for this kind of averaging but I seriously doubt if you really want to do things this way.

Roger Stafford

Subject: faster way to implement corr

From: heidi jiang

Date: 16 May, 2012 01:54:17

Message: 3 of 6

"Roger Stafford" wrote in message <jov0c3$ebk$1@newscl01ah.mathworks.com>...
> "heidi jiang" <heydee@gmail.com> wrote in message <jourtj$quo$1@newscl01ah.mathworks.com>...
> > I'm currently trying to use corr to perform 230,000 x 230,000 correlations. To preserve memory I've been doing this in a for loop (first vector x all 230,000 vectors, second vector x all 230,000...). This works because I just need an average correlation for each iteration of the for loop.
> - - - - - - - - -
> What exactly do you mean by taking an "average correlation" for each of your iterations? If you mean, for each vector finding the average of its (Pearson?) correlations with all the other vectors, this average would be a very strange kind of "correlation". For example, suppose two vectors are such that one is the negative of the other. Then taking the average of their two respective correlations with any third vector would always give you a zero result. In other words, you could get a lot of cancellation on what ordinarily would be regarded as very strong correlations by such an "averaging".
>
> There is an obvious way of greatly simplifying your task for this kind of averaging but I seriously doubt if you really want to do things this way.
>
> Roger Stafford

Hi Roger,

Thanks for your response! I was being a bit succinct in my description of "average correlation." I'm actually going to be thresholding correlations such that correlations < some threshold = 0, and then performing nnz(correlations), because i'm interested in the number of correlations that exceed the threshold per comparison. Therefore my final output for the 230k by 230k correlation matrix is just a 1 x 230k vector.

If you have a much simpler solution to this, please let me know!

Thanks.

Subject: faster way to implement corr

From: Roger Stafford

Date: 16 May, 2012 02:41:07

Message: 4 of 6

"heidi jiang" <heydee@gmail.com> wrote in message <jov1c9$i10$1@newscl01ah.mathworks.com>...
> Thanks for your response! I was being a bit succinct in my description of "average correlation." I'm actually going to be thresholding correlations such that correlations < some threshold = 0, and then performing nnz(correlations), because i'm interested in the number of correlations that exceed the threshold per comparison. Therefore my final output for the 230k by 230k correlation matrix is just a 1 x 230k vector.
- - - - - - - - - -
  Yes, I suspected you were actually thinking along lines like that. Also you are probably taking the absolute value of any negative correlations that occur.

  Unfortunately introducing such non-linear procedures rules out the (overly) simple solution I referred to.

  However, I think it would be advantageous to first subtract the mean value of each vector from each of its elements and then divide them by its standard deviation, thereby creating what you might call "normalized" vectors. The sum of each of their elements would be zero and the sum of their squares would be one. (I assume you are using the Pearson correlation.) It would be wasteful to have to continually recompute these means and standard deviation with repeated calls on 'corr'.

  With such modified vectors you can directly compute the correlations without having to call on 'corr'. Between any two vectors it's a matter of simply multiplying their elements pair-wise and taking the sum of the products, and that should be faster than calls on 'corr'. This could be done with a large set of vectors as a matrix multiplication, but I suspect you would have memory problems with a 230000 by 230000 matrix, and would at least have to do such a multiplication piecemeal.

Roger Stafford

Subject: faster way to implement corr

From: Roger Stafford

Date: 16 May, 2012 04:41:08

Message: 5 of 6

"Roger Stafford" wrote in message <jov443$snv$1@newscl01ah.mathworks.com>...
> This could be done with a large set of vectors as a matrix multiplication, but I suspect you would have memory problems with a 230000 by 230000 matrix, and would at least have to do such a multiplication piecemeal.
- - - - - - - - -
  There is one point I want to correct in my previous discussion. I mentioned a 230000 by 230000 matrix, and that is surely incorrect. Each vector undoubtedly has far fewer than 230000 elements.

  If each of your vectors has n elements, then let A be a 230000 by n matrix in which the rows are the modified vectors and for which you hopefully have enough memory. Then I envision an iteration that takes m of the vectors at a time:

 for k = 1:230000/m % m a factor of 230000
  C = A*A((k-1)*m+1:k*m,:).'; % <-- Correlations
  % C will be 230000 by m in size
  % Each column has correlations of all vectors w.r. to
  % one of the vectors (k-1)*m+1 to k*m.
  % Apply nonlinear ops. on each column of C to get "average" correlation
  % Result is m "average" correlations which should be stored
 end

  Here you would make m as large as possible and still be able to store a 230000 by m matrix C. Each trip through the for-loop would encompass what m of your iterations accomplished and 'corr' will not have been called.

  I hope all this makes sense.

Roger Stafford

Subject: faster way to implement corr

From: heidi jiang

Date: 16 May, 2012 17:06:15

Message: 6 of 6

"Roger Stafford" wrote in message <jovb54$q02$1@newscl01ah.mathworks.com>...
> "Roger Stafford" wrote in message <jov443$snv$1@newscl01ah.mathworks.com>...
> > This could be done with a large set of vectors as a matrix multiplication, but I suspect you would have memory problems with a 230000 by 230000 matrix, and would at least have to do such a multiplication piecemeal.
> - - - - - - - - -
> There is one point I want to correct in my previous discussion. I mentioned a 230000 by 230000 matrix, and that is surely incorrect. Each vector undoubtedly has far fewer than 230000 elements.
>
> If each of your vectors has n elements, then let A be a 230000 by n matrix in which the rows are the modified vectors and for which you hopefully have enough memory. Then I envision an iteration that takes m of the vectors at a time:
>
> for k = 1:230000/m % m a factor of 230000
> C = A*A((k-1)*m+1:k*m,:).'; % <-- Correlations
> % C will be 230000 by m in size
> % Each column has correlations of all vectors w.r. to
> % one of the vectors (k-1)*m+1 to k*m.
> % Apply nonlinear ops. on each column of C to get "average" correlation
> % Result is m "average" correlations which should be stored
> end
>
> Here you would make m as large as possible and still be able to store a 230000 by m matrix C. Each trip through the for-loop would encompass what m of your iterations accomplished and 'corr' will not have been called.
>
> I hope all this makes sense.
>
> Roger Stafford

Hi,

This makes sense and I'll try it. Thank you!

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us