File Exchange

image thumbnail

Weighted Kendall Rank Correlation Matrix

version 1.2 (3.14 KB) by

Fast Weighted Kendall Rank Correlation Matrix (much much faster than Matlab CORR)

5 Downloads

Updated

View License

TAU = KENDALLTAU(Y) returns an N-by-N matrix containing the pairwise Kendall rank correlation coefficient between each pair of columns in the T-by-N matrix Y. The coefficients are adjusted for ties (it's the so called "tau-b"). Kendall's tau-b is identical to the standard tau (or tau-a) when there are no ties.
TAU = KENDALLTAU(Y, w) returns the Weighted Kendall Rank Correlation Matrix, where w is a [T * (T - 1) / 2]-by-1 vector of weights for all combinations of comparisons between observations i and j.

Reference: F. Pozzi, T. Di Matteo, T. Aste, "Exponential smoothing weighted correlations", The European Physical Journal B, Volume 85, Issue 6, 2012. DOI: 10.1140/epjb/e2012-20697-x

This algorithm, potentially MUCH MUCH faster than Matlab CORR function (seconds vs hours), has been thought for small datasets: a contiguous block of your machine's virtual memory is needed, in order to store a matrix of dimensions [T * (T - 1) / 2]-by-N

The basic idea is that Kendall tau is nothing else or more than a linear correlation of all pairwise signs between variables.

Notice that no NaN or Inf value is allowed in Y: please clean your data before using KENDALLTAU; also, this function doesn't calculate p-values (but the implementation should be relatively simple).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE1: How to use this function (on my very limited laptop)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 300;
% T = 100;
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 0.577000 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 132.241000 seconds. <---
% % ---> <---
%
% plot(tau1(:) - tau2(:), '.')
% set(gca, 'YLim', [-1e-12, 1e-12]); % exactly same results
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE2: How to use this function (on a decent computer, fast and with a big memory available).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 1000; % 10/3 times bigger than before
% T = 1000; % 10 times bigger than before
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 48.826421 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 13398.811714 seconds. <---
% % ---> <---
%
% temp = tau1(:) - tau2(:);
% temp = hist(temp);
% temp % exactly same results
% % 0 0 0 0 0 1000000 0 0 0 0
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE3: Weighted Kendall Rank Correlation Matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 100; % Number of variables
% T = 200; % Number of observations
% Y = randint(T, N, [0, 100]); % Lots of ties
%
% % Weights with exponential smoothing
% alpha = 3 / T;
% w0 = ((exp(alpha) + 1) * (exp(alpha) - 1) ^ 2) / exp(2 * alpha) / (1 - exp(-alpha * T)) / (1 - exp(-alpha * (T - 1)));
% % Prepare indexes for all combinations without repetition
% k = 1;
% for i = 1:(T - 1)
% i1(k:(k + T - i - 1)) = repmat(i, 1, (T - i));
% i2(k:(k + T - i - 1)) = ((i + 1):T);
% k = k + T - i;
% end
% w = w0 * exp(alpha * (i1 + i2 - 2 * T));
%
% tic, tau1 = kendalltau(Y, w); toc
% tic, tau2 = kendalltau(Y); toc
%
% plot(tau2(:), tau1(:), '.') % Compare Weighted vs
% % non-Weighted Kendall Rank
% % Correlation Matrices
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% See also CORRCOEF, CORR.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Comments and Ratings (4)

JohannesBuck

Great work - the original version for Kendall in Matlab really has huge performance problems in most setting - I'm actually wondering why you don't have more downloads for such a great programme...

Adama

Adama (view profile)

Arvind Iyer

Oops...please ignore the earlier comment. It was meant for another file.

Just one note. Suppose you have only two huge vectors:

T = 100000;
x = randn(T, 1);
y = randn(T, 1);

Then no, don't use my function, use CORR instead. In fact, my function will try to create a matrix of T * (T - 1) / 2 = 4.99995 billions of rows (practically impossible).
On the contrary, in this case CORR has a very good performance:

tic, z = corr(x, y, 'type', 'kendall'); toc
Elapsed time is 418.745000 seconds.

Updates

1.2

minor edits

1.1

Reference added

MATLAB Release
MATLAB 7 (R14)
Acknowledgements

Inspired by: Weighted Correlation Matrix

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video