Code covered by the BSD License  

Highlights from
Weighted Kendall Rank Correlation Matrix

Be the first to rate this file! 5 Downloads (last 30 days) File Size: 3.06 KB File ID: #27361

Weighted Kendall Rank Correlation Matrix

by Francesco Pozzi

 

24 Apr 2010

Fast Weighted Kendall Rank Correlation Matrix (much much faster than Matlab CORR)

| Watch this File

File Information
Description

TAU = KENDALLTAU(Y) returns an N-by-N matrix containing the pairwise Kendall rank correlation coefficient between each pair of columns in the T-by-N matrix Y. The coefficients are adjusted for ties (it's the so called "tau-b"). Kendall's tau-b is identical to the standard tau (or tau-a) when there are no ties.

TAU = KENDALLTAU(Y, w) returns the Weighted Kendall Rank Correlation Matrix, where w is a [T * (T - 1) / 2]-by-1 vector of weights for all combinations of comparisons between observations i and j.

This algorithm, potentially MUCH MUCH faster than Matlab CORR function (seconds vs hours), has been thought for small datasets: a contiguous block of your machine's virtual memory is needed, in order to store a matrix of dimensions [T * (T - 1) / 2]-by-N

The basic idea is that Kendall tau is nothing else or more than a linear correlation of all pairwise signs between variables.

Notice that no NaN or Inf value is allowed in Y: please clean your data before using KENDALLTAU; also, this function doesn't calculate p-values (but the implementation should be relatively simple).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE1: How to use this function (on my very limited laptop)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 300;
% T = 100;
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 0.577000 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 132.241000 seconds. <---
% % ---> <---
%
% plot(tau1(:) - tau2(:), '.')
% set(gca, 'YLim', [-1e-12, 1e-12]); % exactly same results
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE2: How to use this function (on a decent computer, fast and with a big memory available).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 1000; % 10/3 times bigger than before
% T = 1000; % 10 times bigger than before
% Y = randint(T, N, [0, 100]); % Lots of ties
% tic, tau1 = kendalltau(Y); toc % Try this function
%
% % ---> <---
% % ---> Elapsed time is 48.826421 seconds. <---
% % ---> <---
%
% tic, tau2 = corr(Y, 'Type', 'kendall'); toc % Try CORR
%
% % ---> <---
% % ---> Elapsed time is 13398.811714 seconds. <---
% % ---> <---
%
% temp = tau1(:) - tau2(:);
% temp = hist(temp);
% temp % exactly same results
% % 0 0 0 0 0 1000000 0 0 0 0
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
EXAMPLE3: Weighted Kendall Rank Correlation Matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% N = 100; % Number of variables
% T = 200; % Number of observations
% Y = randint(T, N, [0, 100]); % Lots of ties
%
% % Weights with exponential smoothing
% alpha = 3 / T;
% w0 = ((exp(alpha) + 1) * (exp(alpha) - 1) ^ 2) / exp(2 * alpha) / (1 - exp(-alpha * T)) / (1 - exp(-alpha * (T - 1)));
% % Prepare indexes for all combinations without repetition
% k = 1;
% for i = 1:(T - 1)
% i1(k:(k + T - i - 1)) = repmat(i, 1, (T - i));
% i2(k:(k + T - i - 1)) = ((i + 1):T);
% k = k + T - i;
% end
% w = w0 * exp(alpha * (i1 + i2 - 2 * T));
%
% tic, tau1 = kendalltau(Y, w); toc
% tic, tau2 = kendalltau(Y); toc
%
% plot(tau2(:), tau1(:), '.') % Compare Weighted vs
% % non-Weighted Kendall Rank
% % Correlation Matrices
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% See also CORRCOEF, CORR.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

MATLAB release MATLAB 7 (R14)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (2)
03 May 2010 Francesco Pozzi

Just one note. Suppose you have only two huge vectors:

T = 100000;
x = randn(T, 1);
y = randn(T, 1);

Then no, don't use my function, use CORR instead. In fact, my function will try to create a matrix of T * (T - 1) / 2 = 4.99995 billions of rows (practically impossible).
On the contrary, in this case CORR has a very good performance:

tic, z = corr(x, y, 'type', 'kendall'); toc
Elapsed time is 418.745000 seconds.

21 Jul 2011 Arvind Iyer

Oops...please ignore the earlier comment. It was meant for another file.

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
corrcoef Francesco Pozzi 26 Apr 2010 10:38:10
corr Francesco Pozzi 26 Apr 2010 10:38:10
weighted kendall rank correlation matrix Francesco Pozzi 26 Apr 2010 10:38:10

Contact us at files@mathworks.com