Code covered by the BSD License  

Highlights from
acape

from acape by Antonio Trujillo-Ortiz
Canonical correlation analysis between two sets of variables made on the same objects.

acape(X,Y,alpha)
function [acape] = acape(X,Y,alpha)
% ACAPE Canonical Correlation Analysis (CCA).
%Computes the interrelationships between two sets of variables made on the same objects.
%The canonical correlation is the maximum correlation between linear functions of the two
%vector variables. Linearity is important because the analysis is performed on the correlation
%matrix which reflect linear relationships. After this there may to locate additional pairs 
%of functions that maximally correlate, subject to the restriction that the functions in each
%new pair must be uncorrelated with all previously located functions in both domains (orthogonal).
%Geometrically, the canonical model can be considered an exploration of the extent to which 
%objects occupy the same relative positions in one measurement space as they do in the other.
%With p predictor and q criterion variables, we have min(p,q) of canonical coefficients.
%A complete procedure follows with a test of significance of canonical correlations through
%the Bartlett's test.
%
%   Syntax: function [acape] = acape(X,Y,alpha) 
%      
%     Inputs:
%          X,Y - data matrices (Size of matrices must be n-by-p). 
%        alpha - significance (default = 0.05).
%     Outputs:
%              - Canonical functions.
%              - Correlations between the canonical and original variables.
%              - Proportion of variance extracted.
%              - Redundancy.
%              - Chi-square tests with successive roots removed.
%
%
%    Example: Suppose we have 8 data corresponding to the measurement of 4 
%             variables, where is of interest to split them in two domains
%             and  to  analyze  in  order  to  identify  and  quantify any
%             interrelation between the two sets by a canonical correlation
%             analysis with a significance level of 0.05.
%
%                      ------------------------------
%                         X1      X2      X3     X4
%                      ------------------------------
%                         1       1       1      1
%                         7       1       7      1
%                         7       7       5      6
%                         1       6       1      7
%                         7       3       7      5
%                         6       4       7      7
%                         7       1       7      1
%                         2       1       7      1
%                      ------------------------------
%
%     Data matrices must be:
%             X=[1 1;7 1;7 7;1 6;7 3;6 4;7 1;2 1];
%
%             Y=[1 1;7 1;5 6;1 7;7 5;7 7;7 1;7 1];
%
%     Calling on Matlab the function: 
%             acape(X,Y)
%
%     Answer is:
%
% U-Canonical Functions (left hand).
% --------------------------------------------------------------------------------
% Uvariates =
%     0.3403   -0.9423
%    -0.9613   -0.2822
% --------------------------------------------------------------------------------
% Functions = columns. On variates, Variate1 = first row and so forth to 2
% 
% V-Canonical Functions (right hand).
% --------------------------------------------------------------------------------
% Vvariates =
%     0.4494   -0.9097
%    -0.8205   -0.5970
% --------------------------------------------------------------------------------
% Functions = columns. On variates, Variate1 = first row and so forth to 2
% 
% Correlations between the canonical and original variables, battery 1.
% --------------------------------------------------------------------------------
% Battery1 =
%     0.2816   -0.9595
%    -0.9405   -0.3397
% --------------------------------------------------------------------------------
% Canonical = columns. Original, Variate1 = first row and so forth to 2
 
% Correlations between the canonical and original variables, battery 2.
% --------------------------------------------------------------------------------
% Battery2 =
%     0.5884   -0.8086
%    -0.8966   -0.4429
% --------------------------------------------------------------------------------
% Canonical = columns. Original, Variate1 = first row and so forth to 2
%
% Proportion of variance extracted from original
% variables by the new canonical variates.
% -------------------------------------------------
%        U         V
% -------------------------------------------------
%     0.4820    0.5750
%     0.5180    0.4250
% -------------------------------------------------
% Canonical variate 1 = first row and so forth to 2
% 
% Amount of variance in one set of variables
% extracted by the canonical variables of the
% other set of variables (redundancy).
% --------------------------------------------
%      X       Y
% --------------------------------------------
%   0.4100  0.4892
%   0.3010  0.2469
% --------------------------------------------
% Original set = columns.
% 
% Chi-square Tests with Successive Roots Removed.
% -------------------------------------------------------------------------
% Removed    Eigenvalue   CanCor      LW       Chi-sqr.     df       P
% -------------------------------------------------------------------------
%              0.8507     0.9223    0.0626      12.4725      4    0.0142
%    1         0.5810     0.7623    0.4190       3.9150      1    0.0479
% -------------------------------------------------------------------------
% With a given significance of: 0.05
% The number of significant canonical correlations were the first: 2
% [If P-value >= alpha, it is not significative. Else, it results significative.]
% 
% According to the results, do you want only the significant canonical functions? (y/n): n
%
%
%  Created by A. Trujillo-Ortiz, R. Hernandez-Walls and A. Castro-Perez
%             Facultad de Ciencias Marinas
%             Universidad Autonoma de Baja California
%             Apdo. Postal 453
%             Ensenada, Baja California
%             Mexico.
%             atrujo@uabc.mx
%             And the special collaboration of the post-graduate students of the 2004:1
%             Multivariate Statistics Course: Alejandra Agundez-Amador, Laura Huerta-Tamayo,
%             Elizabeth Romero-Hernandez and Juan Carlos Solis-Bautista.
%
%  Copyright (C) May 30, 2004.
%
%  To cite this file, this would be an appropriate format:
%  Trujillo-Ortiz, A., R. Hernandez-Walls, A. Castro-Perez, A. Agundez-Amador, J.C. Solis-Bautista,
%         E. Romero-Hernandez and L. Huerta-Tamayo. (2004). ACAPE: Canonical Correlation Analysis. 
%         A MATLAB file. [WWW document]. URL http://www.mathworks.com/matlabcentral/fileexchange/
%         loadFile.do?objectId=5230&objectType=FILE
%
%  References:
% 
%  Cooley, W. W. and Lohnes, P. R. (1971), Multivariate Data Analysis.
%              New-York:John Wiley & Sons, Inc. pp. 168-183. 
%  Johnson, R. A. and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis.
%              3rd. ed. New-Jersey:Prentice Hall. pp. 459-492.
%

if nargin < 3, 
    alpha = 0.05; %(default); 
end; 

if nargin < 2,
   error('Requires two arguments.');
   return;
end;
 
[rx cx] = size(X);
[ry cy] = size(Y);

if rx ~= ry
   error('Input matrices must have the same number of rows.');
   return;
end;

if (rx <= cx) | (ry <= cy)
   error('Rows of the input matrices must have at least same number of columns.');
   return;
end;

p = max(cx,cy);
q = min(cx,cy);

%Assignation of the criterion measures.
if cx >= cy
   D = [X Y];
else
   D = [Y X];
end;

R = corrcoef(D); %correlation matrix

%Subdivition of the correlation matrix.
A = R(1:p,1:p);
B = R(p+1:p+q,p+1:p+q);
C = R(1:p,p+1:p+q);

%Generation of the M-nonsymmetric matrix.
M = inv(B)*C'*inv(A)*C;

%Eigenstructure of matrix M.
[b R2] = eig(M);
[R2 k] = sort(diag(R2));
k = flipud(k);
b = b(:,k);

%Canonical correlation for the pair of canonical variates.
Rc = sqrt(R2);
Rc = flipud(Rc);

T = b'*B*b;
nT = abs(inv(sqrt(T)));

V = b*nT;
U = (inv(A)*C*V)*diag((Rc').^-1);
Uvariates = U;
Vvariates = V;

%Canonical functions.
disp(' ')
disp('U-Canonical Functions (left hand).')
fprintf('--------------------------------------------------------------------------------\');
Uvariates
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
disp(' ')
disp('V-Canonical Functions (right hand).')
fprintf('--------------------------------------------------------------------------------\');
Vvariates
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);

%Correlations between the new canonical variates and the original ones.
%These are commonly known as batteries.
B1 = A*U;
B2 = B*V;
Battery1 = B1;
Battery2 = B2;

disp(' ')
disp('Correlations between the canonical and original variables, battery 1.')
fprintf('--------------------------------------------------------------------------------\');
Battery1
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
disp(' ')
disp('Correlations between the canonical and original variables, battery 2.')
fprintf('--------------------------------------------------------------------------------\');
Battery2
fprintf('--------------------------------------------------------------------------------\n');
fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);

PVe1 = diag((B1'*B1)/p); %proportion of variance extracted from variables by the 1st canonical variates
PVe2 = diag((B2'*B2)/q); %proportion of variance extracted from variables by the 2nd canonical variates
PV = [PVe1 PVe2];

disp(' ')
disp('Proportion of variance extracted from original')
disp('variables by the new canonical variates.')
fprintf('-------------------------------------------------\n');
fprintf('     U       V\n');
fprintf('-------------------------------------------------\n');
fprintf('%8.4f%8.4f\n',[PV(:,1),PV(:,2)].');
fprintf('-------------------------------------------------\n');
fprintf('Canonical variate 1 = first row and so forth to %.i\n', q);

R2 = Rc.^2;
R2 = R2';
Rdx = diag(R2)*PVe1;
Rdy = diag(R2)*PVe2;
Rd = [Rdx Rdy];

disp(' ')
disp('Amount of variance in one set of variables')
disp('extracted by the canonical variables of the')
disp('other set of variables (redundancy).')
fprintf('--------------------------------------------\n');
fprintf('     X       Y\n');
fprintf('--------------------------------------------\n');
fprintf('%8.4f%8.4f\n',[Rd(:,1),Rd(:,2)].');
fprintf('--------------------------------------------\n');
disp('Original set = columns.')

%Bartlett's approximate chi-squared statistic for testing
%the canonical correlation coefficients
i = 0:(q-1);
LL = 1-R2;
LW = fliplr(cumprod(fliplr(LL))); %statistic Wilk's lambda
v = rx-1;
df = (p-i).*(q-i); %Chi-square statistic degrees of freedom
X2 = -(v-0.5*(p+q+1)).*log(LW); %approximation Chi-square distribution
P = 1 - chi2cdf(X2,df); %P-value associated to the Chi-square
c = sum([P <= alpha]); %number of significant canonical correlations

disp(' ')
disp('Chi-square Tests with Successive Roots Removed.')
fprintf('-------------------------------------------------------------------------\n');
disp('Removed    Eigenvalue   CanCor      LW       Chi-sqr.     df       P')
fprintf('-------------------------------------------------------------------------\n');
fprintf('%4.i%15.4f%11.4f%10.4f%13.4f%7.i%10.4f\n',[i',R2',Rc,LW',X2',df',P'].');
fprintf('-------------------------------------------------------------------------\n');
fprintf('With a given significance of: %.2f\n', alpha);
fprintf('The number of significant canonical correlations were the first: %.i\n', c);
disp('[If P-value >= alpha, it is not significative. Else, it results significative.]')
disp(' ')

if c >= 1,
   dc = input('According to the results, do you want only the significant canonical functions? (y/n): ','s');
   if dc == 'y'
      Uvariates = U(:,1:c);
      Vvariates = V(:,1:c);
      %Significant canonical functions.
      disp(' ')
      disp('U-Canonical Functions (left hand).')
      fprintf('--------------------------------------------------------------------------------\');
      Uvariates
      fprintf('--------------------------------------------------------------------------------\n');
      fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
      disp(' ')
      disp('V-Canonical Functions (right hand).')
      fprintf('--------------------------------------------------------------------------------\');
      Vvariates
      fprintf('--------------------------------------------------------------------------------\n');
      fprintf('Functions = columns. On variates, Variate1 = first row and so forth to %.i\n', q);
      
      %Correlations between the new canonical variates and the original ones.
      %These are commonly known as batteries.
      B1 = A*Uvariates;
      B2 = B*Vvariates;
      Battery1 = B1;
      Battery2 = B2;
      
      disp(' ')
      disp('Correlations between the canonical and original variables, battery 1.')
      fprintf('--------------------------------------------------------------------------------\');
      Battery1
      fprintf('--------------------------------------------------------------------------------\n');
      fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
      disp(' ')
      disp('Correlations between the canonical and original variables, battery 2.')
      fprintf('--------------------------------------------------------------------------------\');
      Battery2
      fprintf('--------------------------------------------------------------------------------\n');
      fprintf('Canonical = columns. Original, Variate1 = first row and so forth to %.i\n', q);
   else
   end;
end;

   

Contact us at files@mathworks.com