File Exchange

Weighted Kendall Rank Correlation Matrix

version 1.2 (3.14 KB) by

Fast Weighted Kendall Rank Correlation Matrix (much much faster than Matlab CORR)

Updated

TAU = KENDALLTAU(Y) returns an N-by-N matrix containing the pairwise Kendall rank correlation coefficient between each pair of columns in the T-by-N matrix Y. The coefficients are adjusted for ties (it's the so called "tau-b"). Kendall's tau-b is identical to the standard tau (or tau-a) when there are no ties.
TAU = KENDALLTAU(Y, w) returns the Weighted Kendall Rank Correlation Matrix, where w is a [T * (T - 1) / 2]-by-1 vector of weights for all combinations of comparisons between observations i and j.

Reference: F. Pozzi, T. Di Matteo, T. Aste, "Exponential smoothing weighted correlations", The European Physical Journal B, Volume 85, Issue 6, 2012. DOI: 10.1140/epjb/e2012-20697-x

This algorithm, potentially MUCH MUCH faster than Matlab CORR function (seconds vs hours), has been thought for small datasets: a contiguous block of your machine's virtual memory is needed, in order to store a matrix of dimensions [T * (T - 1) / 2]-by-N

The basic idea is that Kendall tau is nothing else or more than a linear correlation of all pairwise signs between variables.

Notice that no NaN or Inf value is allowed in Y: please clean your data before using KENDALLTAU; also, this function doesn't calculate p-values (but the implementation should be relatively simple).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE1: How to use this function (on my very limited laptop)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 300;
% T = 100;
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 0.577000 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 132.241000 seconds. <---
% % ---> <---
%
% plot(tau1(:) - tau2(:), '.')
% set(gca, 'YLim', [-1e-12, 1e-12]); % exactly same results
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE2: How to use this function (on a decent computer, fast and with a big memory available).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 1000; % 10/3 times bigger than before
% T = 1000; % 10 times bigger than before
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 48.826421 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 13398.811714 seconds. <---
% % ---> <---
%
% temp = tau1(:) - tau2(:);
% temp = hist(temp);
% temp % exactly same results
% % 0 0 0 0 0 1000000 0 0 0 0
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE3: Weighted Kendall Rank Correlation Matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 100; % Number of variables
% T = 200; % Number of observations
% Y = randint(T, N, [0, 100]); % Lots of ties
%
% % Weights with exponential smoothing
% alpha = 3 / T;
% w0 = ((exp(alpha) + 1) * (exp(alpha) - 1) ^ 2) / exp(2 * alpha) / (1 - exp(-alpha * T)) / (1 - exp(-alpha * (T - 1)));
% % Prepare indexes for all combinations without repetition
% k = 1;
% for i = 1:(T - 1)
% i1(k:(k + T - i - 1)) = repmat(i, 1, (T - i));
% i2(k:(k + T - i - 1)) = ((i + 1):T);
% k = k + T - i;
% end
% w = w0 * exp(alpha * (i1 + i2 - 2 * T));
%
% tic, tau1 = kendalltau(Y, w); toc
% tic, tau2 = kendalltau(Y); toc
%
% plot(tau2(:), tau1(:), '.') % Compare Weighted vs
% % non-Weighted Kendall Rank
% % Correlation Matrices
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

JohannesBuck

JohannesBuck (view profile)

Great work - the original version for Kendall in Matlab really has huge performance problems in most setting - I'm actually wondering why you don't have more downloads for such a great programme...

Arvind Iyer

Arvind Iyer (view profile)

Oops...please ignore the earlier comment. It was meant for another file.

Liber Eleutherios

Liber Eleutherios (view profile)

Just one note. Suppose you have only two huge vectors:

T = 100000;
x = randn(T, 1);
y = randn(T, 1);

Then no, don't use my function, use CORR instead. In fact, my function will try to create a matrix of T * (T - 1) / 2 = 4.99995 billions of rows (practically impossible).
On the contrary, in this case CORR has a very good performance:

tic, z = corr(x, y, 'type', 'kendall'); toc
Elapsed time is 418.745000 seconds.

 12 Jan 2015 1.2 minor edits 7 Jun 2012 1.1 Reference added
MATLAB 7 (R14)