function [acape] = acape(X,Y,alpha)
% ACAPE Canonical Correlation Analysis (CCA).
%Computes the interrelationships between two sets of variables made on the same objects.
%The canonical correlation is the maximum correlation between linear functions of the two
%vector variables. Linearity is important because the analysis is performed on the correlation
%matrix which reflect linear relationships. After this there may to locate additional pairs
%of functions that maximally correlate, subject to the restriction that the functions in each
%new pair must be uncorrelated with all previously located functions in both domains (orthogonal).
%Geometrically, the canonical model can be considered an exploration of the extent to which
%objects occupy the same relative positions in one measurement space as they do in the other.
%With p predictor and q criterion variables, we have min(p,q) of canonical coefficients.
%A complete procedure follows with a test of significance of canonical correlations through
%the Bartlett's test.
%
% Syntax: function [acape] = acape(X,Y,alpha)
%
% Inputs:
% X,Y - data matrices (Size of matrices must be n-by-p).
% alpha - significance (default = 0.05).
% Outputs:
% - Canonical functions.
% - Correlations between the canonical and original variables.
% - Proportion of variance extracted.
% - Redundancy.
% - Chi-square tests with successive roots removed.
%
%
% Example: Suppose we have 8 data corresponding to the measurement of 4
% variables, where is of interest to split them in two domains
% and to analyze in order to identify and quantify any
% interrelation between the two sets by a canonical correlation
% analysis with a significance level of 0.05.
%
% ------------------------------
% X1 X2 X3 X4
% ------------------------------
% 1 1 1 1
% 7 1 7 1
% 7 7 5 6
% 1 6 1 7
% 7 3 7 5
% 6 4 7 7
% 7 1 7 1
% 2 1 7 1
% ------------------------------
%
% Data matrices must be:
% X=[1 1;7 1;7 7;1 6;7 3;6 4;7 1;2 1];
%
% Y=[1 1;7 1;5 6;1 7;7 5;7 7;7 1;7 1];
%
% Calling on Matlab the function:
% acape(X,Y)
%
% Answer is:
%
% U-Canonical Functions (left hand).
% --------------------------------------------------------------------------------
% Uvariates =
% 0.3403 -0.9423
% -0.9613 -0.2822
% --------------------------------------------------------------------------------
% Functions = columns. On variates, Variate1 = first row and so forth to 2
%
% V-Canonical Functions (right hand).
% --------------------------------------------------------------------------------
% Vvariates =
% 0.4494 -0.9097
% -0.8205 -0.5970
% --------------------------------------------------------------------------------
% Functions = columns. On variates, Variate1 = first row and so forth to 2
%
% Correlations between the canonical and original variables, battery 1.
% --------------------------------------------------------------------------------
% Battery1 =
% 0.2816 -0.9595
% -0.9405 -0.3397
% --------------------------------------------------------------------------------
% Canonical = columns. Original, Variate1 = first row and so forth to 2
% Correlations between the canonical and original variables, battery 2.
% --------------------------------------------------------------------------------
% Battery2 =
% 0.5884 -0.8086
% -0.8966 -0.4429
% --------------------------------------------------------------------------------
% Canonical = columns. Original, Variate1 = first row and so forth to 2
%
% Proportion of variance extracted from original
% variables by the new canonical variates.
% -------------------------------------------------
% U V
% -------------------------------------------------
% 0.4820 0.5750
% 0.5180 0.4250
% -------------------------------------------------
% Canonical variate 1 = first row and so forth to 2
%
% Amount of variance in one set of variables
% extracted by the canonical variables of the
% other set of variables (redundancy).
% --------------------------------------------
% X Y
% --------------------------------------------
% 0.4100 0.4892
% 0.3010 0.2469
% --------------------------------------------
% Original set = columns.
%
% Chi-square Tests with Successive Roots Removed.
% -------------------------------------------------------------------------
% Removed Eigenvalue CanCor LW Chi-sqr. df P
% -------------------------------------------------------------------------
% 0.8507 0.9223 0.0626 12.4725 4 0.0142
% 1 0.5810 0.7623 0.4190 3.9150 1 0.0479
% -------------------------------------------------------------------------
% With a given significance of: 0.05
% The number of significant canonical correlations were the first: 2
% [If P-value >= alpha, it is not significative. Else, it results significative.]
%
% According to the results, do you want only the significant canonical functions? (y/n): n
%
%
% Created by A. Trujillo-Ortiz, R. Hernandez-Walls and A. Castro-Perez
% Facultad de Ciencias Marinas
% Universidad Autonoma de Baja California
% Apdo. Postal 453
% Ensenada, Baja California
% Mexico.
% atrujo@uabc.mx
% And the special collaboration of the post-graduate students of the 2004:1
% Multivariate Statistics Course: Alejandra Agundez-Amador, Laura Huerta-Tamayo,
% Elizabeth Romero-Hernandez and Juan Carlos Solis-Bautista.
%
% Copyright (C) May 30, 2004.
%
% To cite this file, this would be an appropriate format:
% Trujillo-Ortiz, A., R. Hernandez-Walls, A. Castro-Perez, A. Agundez-Amador, J.C. Solis-Bautista,
% E. Romero-Hernandez and L. Huerta-Tamayo. (2004). ACAPE: Canonical Correlation Analysis.
% A MATLAB file. [WWW document]. URL http://www.mathworks.com/matlabcentral/fileexchange/
% loadFile.do?objectId=5230&objectType=FILE
%
% References:
%
% Cooley, W. W. and Lohnes, P. R. (1971), Multivariate Data Analysis.
% New-York:John Wiley & Sons, Inc. pp. 168-183.
% Johnson, R. A. and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis.
% 3rd. ed. New-Jersey:Prentice Hall. pp. 459-492.
%
if nargin < 3,
alpha = 0.05; %(default);
end;
if nargin < 2,
error('Requires two arguments.');
return;
end;
[rx cx] = size(X);
[ry cy] = size(Y);
if rx ~= ry
error('Input matrices must have the same number of rows.');
return;
end;
if (rx <= cx) | (ry <= cy)
error('Rows of the input matrices must have at least same number of columns.');
return;
end;
p = max(cx,cy);
q = min(cx,cy);
%Assignation of the criterion measures.
if cx >= cy
D = [X Y];
else
D = [Y X];
end;
R = corrcoef(D); %correlation matrix
%Subdivition of the correlation matrix.
A = R(1:p,1:p);
B = R(p+1:p+q,p+1:p+q);
C = R(1:p,p+1:p+q);
%Generation of the M-nonsymmetric matrix.
M = inv(B)*C'*inv(A)*C;
%Eigenstructure of matrix M.
[b R2] = eig(M);
[R2 k] = sort(diag(R2));
k = flipud(k);
b = b(:,k);
%Canonical correlation for the pair of canonical variates.
Rc = sqrt(R2);
Rc = flipud(Rc);
T = b'*B*b;
nT = abs(inv(sqrt(T)));
V = b*nT;
U = (inv(A)*C*V)*diag((Rc').^-1);
Uvariates = U;
Vvariates = V;
%Canonical functions.
disp(' ')
disp('U-Canonical Functions (left hand).')
fprintf('--------------------------------------------------------------------------------\');
Uvariates
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
disp(' ')
disp('V-Canonical Functions (right hand).')
fprintf('--------------------------------------------------------------------------------\');
Vvariates
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
%Correlations between the new canonical variates and the original ones.
%These are commonly known as batteries.
B1 = A*U;
B2 = B*V;
Battery1 = B1;
Battery2 = B2;
disp(' ')
disp('Correlations between the canonical and original variables, battery 1.')
fprintf('--------------------------------------------------------------------------------\');
Battery1
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
disp(' ')
disp('Correlations between the canonical and original variables, battery 2.')
fprintf('--------------------------------------------------------------------------------\');
Battery2
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
PVe1 = diag((B1'*B1)/p); %proportion of variance extracted from variables by the 1st canonical variates
PVe2 = diag((B2'*B2)/q); %proportion of variance extracted from variables by the 2nd canonical variates
PV = [PVe1 PVe2];
disp(' ')
disp('Proportion of variance extracted from original')
disp('variables by the new canonical variates.')
fprintf('-------------------------------------------------\n');
fprintf(' U V\n');
fprintf('-------------------------------------------------\n');
fprintf('%8.4f%8.4f\n',[PV(:,1),PV(:,2)].');
fprintf('-------------------------------------------------\n');
fprintf('Canonical variate 1 = first row and so forth to %.i\n', q);
R2 = Rc.^2;
R2 = R2';
Rdx = diag(R2)*PVe1;
Rdy = diag(R2)*PVe2;
Rd = [Rdx Rdy];
disp(' ')
disp('Amount of variance in one set of variables')
disp('extracted by the canonical variables of the')
disp('other set of variables (redundancy).')
fprintf('--------------------------------------------\n');
fprintf(' X Y\n');
fprintf('--------------------------------------------\n');
fprintf('%8.4f%8.4f\n',[Rd(:,1),Rd(:,2)].');
fprintf('--------------------------------------------\n');
disp('Original set = columns.')
%Bartlett's approximate chi-squared statistic for testing
%the canonical correlation coefficients
i = 0:(q-1);
LL = 1-R2;
LW = fliplr(cumprod(fliplr(LL))); %statistic Wilk's lambda
v = rx-1;
df = (p-i).*(q-i); %Chi-square statistic degrees of freedom
X2 = -(v-0.5*(p+q+1)).*log(LW); %approximation Chi-square distribution
P = 1 - chi2cdf(X2,df); %P-value associated to the Chi-square
c = sum([P <= alpha]); %number of significant canonical correlations
disp(' ')
disp('Chi-square Tests with Successive Roots Removed.')
fprintf('-------------------------------------------------------------------------\n');
disp('Removed Eigenvalue CanCor LW Chi-sqr. df P')
fprintf('-------------------------------------------------------------------------\n');
fprintf('%4.i%15.4f%11.4f%10.4f%13.4f%7.i%10.4f\n',[i',R2',Rc,LW',X2',df',P'].');
fprintf('-------------------------------------------------------------------------\n');
fprintf('With a given significance of: %.2f\n', alpha);
fprintf('The number of significant canonical correlations were the first: %.i\n', c);
disp('[If P-value >= alpha, it is not significative. Else, it results significative.]')
disp(' ')
if c >= 1,
dc = input('According to the results, do you want only the significant canonical functions? (y/n): ','s');
if dc == 'y'
Uvariates = U(:,1:c);
Vvariates = V(:,1:c);
%Significant canonical functions.
disp(' ')
disp('U-Canonical Functions (left hand).')
fprintf('--------------------------------------------------------------------------------\');
Uvariates
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
disp(' ')
disp('V-Canonical Functions (right hand).')
fprintf('--------------------------------------------------------------------------------\');
Vvariates
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
%Correlations between the new canonical variates and the original ones.
%These are commonly known as batteries.
B1 = A*Uvariates;
B2 = B*Vvariates;
Battery1 = B1;
Battery2 = B2;
disp(' ')
disp('Correlations between the canonical and original variables, battery 1.')
fprintf('--------------------------------------------------------------------------------\');
Battery1
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
disp(' ')
disp('Correlations between the canonical and original variables, battery 2.')
fprintf('--------------------------------------------------------------------------------\');
Battery2
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
else
end;
end;