Be the first to rate this file! 120 downloads (last 30 days) File Size: 11.31 KB File ID: #22685

Correlation Percentiles

by Francesco Pozzi

 

14 Jan 2009

No BSD License  

CORRPERC estimates percentiles & standard deviations of a correlation matrix, performing a bootstrap

Download Now | Watch this File

File Information
Description

CORRPERC performs a bootstrap (of size equal to n_iters) on correlation matrices of input variable Y and computes the percentiles corrsperc (according to input perc) of each correlation. The function also provides the standard deviation corrstd for each correlation.

[corrsperc, corrstd] = corrperc(Y, perc, n_iters) returns a matrix of size (N * (N - 1) / 2)-by-length(perc). See below for further details.

[corrsperc, corrstd] = corrperc(Y, perc, n_iters, 1) returns a matrix of size N-by-N-by-length(perc)

*******************************************************************

WHY I NEEDED THIS FUNCTION

When the number of columns in variable Y is big and the number of iterations n_iters for the bootstrap is high, then in general percentiles can't be computed at once. In fact, on most machines, the RAM memory won't be enough.

For example, let's assume we have a 1000-by-500 matrix Y and we desire to perform a bootstrap based on 5000 iterations. Then we have 500 * (500 - 1) / 2 = 124750 correlations

So, if percentiles had to be computed at once we would need a matrix of size 124750-by-5000 and from this matrix we would be able to extract the desired percentiles. But my machine can't do that! The RAM is not enough. The amount of required memory is far too much.

Then my idea was to compute percentiles for 10000 correlations at a time or so (if you desire to change this parameter you can do so from within the code, by changing the parameter named corrs_per_step). Then I need matrices of size 10000-by-5000 and repeat the computation 13 times ( ---> ceil(124750 / 10000)).
It's slow and not elegant at all, but it works.

*******************************************************************

INPUTS
   Input Y is a matrix m-by-n where
       m is the number of observations and
       n is the number of variables

   Input perc is a vector of real numbers in the [0, 100] interval:
       0 corresponds to the minimum;
       100 corresponds to the maximum;
       50 corresponds to the median;
       25 and 75 are the first and third quartiles;
       10 is the tenth percentile and so on;
       [1, 99] corresponds to a 98% Centered Confidence Interval.

   Input n_iters is the number of correlation matrices which will be
   generated in order to compute the percentiles desired. The higher
   the number of iterations the higher the precision for the
   estimation of the percentiles. A good - and possibly slow - choice
   is -->
                    n_iters = 1000;

   Input matrix3D is a logical variable: if it is 1, then output
   corrsperc is stored in a 3D matrix and output corrstd is stored in
   a 2D matrix; otherwise corrsperc is stored in a 2D matrix and
   output corrstd is stored in a vector.

OUTPUTS
   Output corrsperc is an N-by-N-by-length(perc) matrix, if matrix3D
   is 1. Otherwise, it is a (n * (n - 1) / 2)-by-length(perc) matrix.
   In the latter case, correlations are selected from the rows of the
   upper triangle of the n-by-n correlation matrix. For example, if
   the correlation matrix is a 9-by-9 matrix:
            a12, a13, a14, a15, a16, a17, a18, a19
                 a23, a24, a25, a26, a27, a28, a29
                    a34, a35, a36, a37, a38, a39
                    a45, a46, a47, a48, a49
                    a56, a57, a58, a59
                    a67, a68, a69
                    a78, a79
                    a89
   then elements will be chosen in the following order:
          a12, a13, a14, a15, a16, a17, a18, a19, a23, a24, a25, a26,
          a27, a28, a29, a34, a35, a36, a37, a38, a39, a45, a46, a47,
          a48, a49, a56, a57, a58, a59, a67, a68, a69, a78, a79, a89
   and will be disposed over the columns of corrsperc. Each column
   represents correlation percentiles according to perc.

   Output corrstd is an estimate of the standard deviation regarding
   each correlation. The estimate is the more accurate the higher the
   value of n_iters and the lower the value of corrs_per_step. If
   matrix3D is 1, corrstd is a N-by-N symmetric matrix; otherwise
   corrstd is a vector of length (N * (N - 1) / 2).

*******************************************************************

% Example

T = 1000;
N = 100;
Y = cumsum(randn(T, N));
perc = [0:100];
n_iters = 250;
[corrsperc, corrstd] = corrperc(Y, perc, n_iters);
% Look at this: 96% Centered Confidence Intervals are approximately
% four times the standard deviations. Cool!
plot((corrsperc(:, 99) - corrsperc(:, 3)) / 2, 2 * corrstd, '.')

MATLAB release MATLAB 7 (R14)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (3)
16 Jan 2009 OH

??? Undefined function or method 'prctile' for input arguments of type
'double'.

Error in ==> corrperc at 201
  corrsperc = prctile(corrs_temp, perc)';

16 Jan 2009 Wolfgang Schwanghart

prctile is part of the statistics toolbox. You forgot to mention this.

17 Jan 2009 Francesco Pozzi

Sorry about that. Yes, you need the Statistics Toolbox, I forgot to mention it. Thank you for reminding. I ignore if percentiles can be computed efficiently otherwise.

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
correlation Francesco Pozzi 15 Jan 2009 14:34:31
percentiles Francesco Pozzi 15 Jan 2009 14:34:31
standard deviation Francesco Pozzi 15 Jan 2009 14:34:31
bootstrap Francesco Pozzi 15 Jan 2009 14:34:31
statistics Francesco Pozzi 15 Jan 2009 14:34:31
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com