Code covered by the BSD License  

Highlights from
Weighted Kendall Rank Correlation Matrix

5.0

5.0 | 1 rating Rate this file 20 Downloads (last 30 days) File Size: 3.14 KB File ID: #27361

Weighted Kendall Rank Correlation Matrix

by

 

24 Apr 2010 (Updated )

Fast Weighted Kendall Rank Correlation Matrix (much much faster than Matlab CORR)

| Watch this File

File Information
Description

TAU = KENDALLTAU(Y) returns an N-by-N matrix containing the pairwise Kendall rank correlation coefficient between each pair of columns in the T-by-N matrix Y. The coefficients are adjusted for ties (it's the so called "tau-b"). Kendall's tau-b is identical to the standard tau (or tau-a) when there are no ties.

TAU = KENDALLTAU(Y, w) returns the Weighted Kendall Rank Correlation Matrix, where w is a [T * (T - 1) / 2]-by-1 vector of weights for all combinations of comparisons between observations i and j.

Reference: F. Pozzi, T. Di Matteo, T. Aste, "Exponential smoothing weighted correlations", The European Physical Journal B, Volume 85, Issue 6, 2012. DOI: 10.1140/epjb/e2012-20697-x

This algorithm, potentially MUCH MUCH faster than Matlab CORR function (seconds vs hours), has been thought for small datasets: a contiguous block of your machine's virtual memory is needed, in order to store a matrix of dimensions [T * (T - 1) / 2]-by-N

The basic idea is that Kendall tau is nothing else or more than a linear correlation of all pairwise signs between variables.

Notice that no NaN or Inf value is allowed in Y: please clean your data before using KENDALLTAU; also, this function doesn't calculate p-values (but the implementation should be relatively simple).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE1: How to use this function (on my very limited laptop)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 300;
% T = 100;
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 0.577000 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 132.241000 seconds. <---
% % ---> <---
%
% plot(tau1(:) - tau2(:), '.')
% set(gca, 'YLim', [-1e-12, 1e-12]); % exactly same results
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE2: How to use this function (on a decent computer, fast and with a big memory available).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 1000; % 10/3 times bigger than before
% T = 1000; % 10 times bigger than before
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 48.826421 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 13398.811714 seconds. <---
% % ---> <---
%
% temp = tau1(:) - tau2(:);
% temp = hist(temp);
% temp % exactly same results
% % 0 0 0 0 0 1000000 0 0 0 0
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE3: Weighted Kendall Rank Correlation Matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 100; % Number of variables
% T = 200; % Number of observations
% Y = randint(T, N, [0, 100]); % Lots of ties
%
% % Weights with exponential smoothing
% alpha = 3 / T;
% w0 = ((exp(alpha) + 1) * (exp(alpha) - 1) ^ 2) / exp(2 * alpha) / (1 - exp(-alpha * T)) / (1 - exp(-alpha * (T - 1)));
% % Prepare indexes for all combinations without repetition
% k = 1;
% for i = 1:(T - 1)
% i1(k:(k + T - i - 1)) = repmat(i, 1, (T - i));
% i2(k:(k + T - i - 1)) = ((i + 1):T);
% k = k + T - i;
% end
% w = w0 * exp(alpha * (i1 + i2 - 2 * T));
%
% tic, tau1 = kendalltau(Y, w); toc
% tic, tau2 = kendalltau(Y); toc
%
% plot(tau2(:), tau1(:), '.') % Compare Weighted vs
% % non-Weighted Kendall Rank
% % Correlation Matrices
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% See also CORRCOEF, CORR.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Acknowledgements

Weighted Correlation Matrix inspired this file.

MATLAB release MATLAB 7 (R14)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (3)
20 Sep 2012 Adama  
21 Jul 2011 Arvind Iyer

Oops...please ignore the earlier comment. It was meant for another file.

03 May 2010 Francesco Pozzi

Just one note. Suppose you have only two huge vectors:

T = 100000;
x = randn(T, 1);
y = randn(T, 1);

Then no, don't use my function, use CORR instead. In fact, my function will try to create a matrix of T * (T - 1) / 2 = 4.99995 billions of rows (practically impossible).
On the contrary, in this case CORR has a very good performance:

tic, z = corr(x, y, 'type', 'kendall'); toc
Elapsed time is 418.745000 seconds.

Updates
07 Jun 2012

Reference added

Contact us