Code covered by the BSD License  

Highlights from
corran

image thumbnail
from corran by Antonio Trujillo-Ortiz
Correspondence Analysis.

corran(X)
function corran(X)
%CORRAN Correspondence Analysis.
% Correspondence Analysis (CA) is a special case of Canonical Correlation
% Analysis (CCA), where one set of entries (categories rather than variables)
% is related to another set. Also, it can be seen as a special case of
% Principal Component Analysis (PCA), where it is used for tables consisting
% of continuous measurement, whereas CA is applied to contingence tables.
%
% CA starts with tabular data, usually two-way cross-classification, though 
% the technique is generalizable to n-way tables with more than two variables.         
% The variables must be discrete: nominal, ordinal or continuous segmented 
% into ranges. Signifiance test is no supported. For model comparision and 
% selection of a best-fit model should be done using another compatible method
% such as log-lineal or logistic regression. So, it is an exploratory analysis
% not a confirmatory one. Use chi-square distances that measure the profiles
% of a set of points between row and columns.
%
% One need the tabular data in a contingency table, that is considered a
% primitive matrix. The correspondence matrix P is defined as the original
% table divided by the grand total n. The profiles uses the marginal row and
% column frequencies. Thus, the row and column profiles are conditional
% densities, and they give the row and column mass. The distances are a variant
% of Euclidian distances, called weighted Euclidian distance. In a symmetric
% fashion the distances are chi-squares. The inertia is taken from the
% moment in mechanics. There is a cloud of profile points with masses adding
% up to 1. These points have a centroid (average profile) and a distance
% (chi-square distance) between profile points. Each profile point
% contributes to the inertia of the whole cloud.
%
% The data in a contingency table can be used to check for association of
% two categorical variables as a test of independence by an approximately
% (asymptotically) distributed chi-square random variable with (a-1)(b-1)
% degrees of freedom (a and b = categories of variable 1 and 2,
% respectively). The overall association is quantified by the chi-squared
% statistic divided by the grand total, called the total inertia. If there 
% is independence, we would expect the rows or columns of the contingency 
% table to have similar profiles. The chi-square can be expressed in vector
% and matrix terms as the nonzero eigenvalues, with rank k = min[(a-1),(b-1)], 
% clearly less than min(a,b). 
%
% The row and column coordinates, with respect to their respective principal
% axes, may be obtained from the singular value decomposition (SVD) of the
% corresponence matrix, transformed by double-centering and standardizing. The
% squares of the singular values are the principal inertias or eigenvalues.
% A singular value is a canonical correlation. 
%
% The rows will be the points projected in a map interpreted in terms of the
% columns as reference points. Row profiles, will be represented by principal
% coordinates, and will be expressed with respect to the column vertices. In
% the same way the colums are interpreted in terms of the rows.
%
% Syntax: function corran(X) 
%      
%     Input:
%          X - Data matrix=contingence table. Size a-categorical variable 1 x
%              b-categorical variable 2.
%     Outputs:
%          Complete Correspondence Analysis
%          Pair-wise Dimensions Plots. For the vertical and horizonal lines
%              we use the hline.m and vline.m files kindly published on FEX
%              by Brandon Kuczenski [http://www.mathworks.com/matlabcentral/
%              fileexchange/loadFile.do?objectId=1039]
%
%  Example: From the example 15.2.2 (Rencher, 2002) we have the number of
%  piston ring failures in each three legs in four compressors found in the
%  same building (all four compessors are oriented in the same direction)
%  and we are interested to make a Corresponence Analysis. Data (piston
%  ring failures) are:
%                              Leg
%                   ----------------------------
%       Compressor   Noth     Center      South
%       ----------------------------------------
%           1         17        17          12
%           2         11         9          13
%           3         11         8          19
%           4         14         7          28
%       ----------------------------------------
%
%  Data matrix must be:
%       X=[17 17 12;11 9 13;11 8 19;14 7 28];
%
%  Calling on Matlab the function: 
%             corran(X)
%  Answer is:
%
%  Inertia and chi-square decomposition of the Correspondence Analysis.
%  ------------------------------------------------------------------------
%                                                              Cummulative       
%   Singular Value     Inertia      Chi-square     Percent       Percent   
%  ------------------------------------------------------------------------
%       0.2653          0.0704       11.6819         99.66         99.66
%       0.0156          0.0002        0.0404          0.34        100.00
%  ------------------------------------------------------------------------
%       Total           0.0706       11.7223        100.00
%  Categorical variable 1 = 4 levels; Categorical variable 2 = 3 levels
%                           P-value = 0.068459; degrees of freedom = 6
%  ------------------------------------------------------------------------
%
%  Created by A. Trujillo-Ortiz, R. Hernandez-Walls and K. Barba-Rojo
%             Facultad de Ciencias Marinas
%             Universidad Autonoma de Baja California
%             Apdo. Postal 453
%             Ensenada, Baja California
%             Mexico.
%             atrujo@uabc.mx
%  Copyright. September 16, 2008.
%
%  To cite this file, this would be an appropriate format:
%  Trujillo-Ortiz, A., R. Hernandez-Walls and K. Barba-Rojo. (2008). corran:
%    Correspondence Analysis. A MATLAB file. [WWW document]. URL http://
%    www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=21505
%
%  Reference:
%  Rencher, A. C. (2000), Methods of Multivariate Analysis. 2nd ed. 
%       John Wiley & Sons. p. 514-525
%

[a,b] = size(X);
N = sum(sum(X)); %grand total
P = X/N; %correspondence matrix
r = sum(P,2); %row total
c = sum(P,1); %column total
Dr = diag(r); %matrix diagonal of r 
Dc = diag(c); %matrix diagonal of c 
R = inv(Dr)*P; %row profiles
C = P*inv(Dc); %column profiles
Z = sqrt(inv(Dr))*(P-r*c)*sqrt(inv(Dc));
[U,S,V] = svd(Z); %singular value decomposition
k = rank(inv(Dr)*(P-r*c)*inv(Dc)*(P-r*c)'); %maximum number of dimensions (=rank(P-r*c))
SV = diag(S); %singular values
SV = SV(1:k); %singular value equal to zero is deleted
EV = SV.^2; %eigenvalues (inertia)
PI = EV./sum(EV)*100; %percent of inertia
PA = cumsum(PI); %cumulative percent of inertia
x2 = N*EV; %chi-squared
A = sqrt(Dr)*U;
B = sqrt(Dc)*V;
X2 = N*trace(inv(Dr)*(P-r*c)*inv(Dc)*(P-r*c)'); %total chi-squared
v = (a-1)*(b-1); %degrees of freedom
P = 1-chi2cdf(X2,v); %P-value
disp(' ')
disp('Inertia and chi-square decomposition of the Correspondence Analysis.')
fprintf('------------------------------------------------------------------------\n');  
disp('                                                            Cummulative       ');
disp(' Singular Value     Inertia      Chi-square     Percent       Percent   ');  
fprintf('------------------------------------------------------------------------\n');
fprintf(' %10.4f      %10.4f    %10.4f    %10.2f    %10.2f\n',[SV,EV,x2,PI,PA].');
fprintf('------------------------------------------------------------------------\n');
fprintf('     Total   %14.4f %13.4f %13.2f\n', sum(EV),N*sum(EV),sum(PI));
disp(['Categorical variable 1 = ',num2str(a) ' levels; ' 'Categorical variable 2 = ',num2str(b) ' levels']);
disp(['                         P-value = ',num2str(P) '; ' 'degrees of freedom = ',num2str(v) '']);
fprintf('------------------------------------------------------------------------\n');

disp(' ');
Dm = [];
for i = 1:k
    for s = i+1:k
        if s ~= i
            d = [i s];
            Dm = [Dm;d];
        end
    end
end

co = (k*(k - 1))/2;
gf = input('Are you interested to get the dimensions score plots? (y/n): ','s');
if gf == 'y'
    disp('--------------');
    fprintf('The pair-wise plots you can get are: %.i\n', co);
    disp('--------------');
    Dm
    fprintf('--------------\n');

    d = 'y';
    while d == 'y';
        g = input('Give me the interested dimensions to plot. Please, use [a b]:');
        figure;
        X=inv(Dr)*A*S; %row coordinates
        X=X(:,1:k);
        Y=inv(Dc)*B*S'; %column coordinates
        Y=Y(:,1:k);
        l1=[1:size(X,1)];
        l2=[1:size(Y,1)];
        labels1 = strread(sprintf('%d ',l1),'%s').';
        labels2 = strread(sprintf('%d ',l2),'%s').';
        plot(X(:,g(1)),X(:,g(2)),'*b',Y(:,g(1)),Y(:,g(2)),'*r',0,0,'-k');
        h = legend('Levels of categorical variable 1','Levels of categorical variable 2',2);
        text(X(:,g(1))+.002,X(:,g(2))+.002,labels1,'Color',[0 0 1]);
        text(Y(:,g(1))+.002,Y(:,g(2))+.002,labels2,'Color',[1 0 0]);
        title('Plot of points of the performed Correspondence Analysis.'); 
        disp(' ')
        disp('Note: At the end of the program execution. If you are interested to fit the point labels')
        disp('on the generated figures you can turn-on active button ''Edit Plot'', do click on the')
        disp('selected label and drag to fix it on the desired position. Then turn-off active ''Edit Plot''.')
        disp(' ')
        xlabel(['Dimension ' num2str(g(1))';]);
        ylabel(['Dimension ' num2str(g(2))';]);
        hline(0,'k'); %this m-file was taken from Brandon Kuczenski''s hline and vline zip
        vline(0,'k'); %this m-file was taken from Brandon Kuczenski''s hline and vline zip
        d=input('Do you need another plot? (y/n):','s');
    end
else
end

return,

Contact us at files@mathworks.com