function corran(X)
%CORRAN Correspondence Analysis.
% Correspondence Analysis (CA) is a special case of Canonical Correlation
% Analysis (CCA), where one set of entries (categories rather than variables)
% is related to another set. Also, it can be seen as a special case of
% Principal Component Analysis (PCA), where it is used for tables consisting
% of continuous measurement, whereas CA is applied to contingence tables.
%
% CA starts with tabular data, usually two-way cross-classification, though
% the technique is generalizable to n-way tables with more than two variables.
% The variables must be discrete: nominal, ordinal or continuous segmented
% into ranges. Signifiance test is no supported. For model comparision and
% selection of a best-fit model should be done using another compatible method
% such as log-lineal or logistic regression. So, it is an exploratory analysis
% not a confirmatory one. Use chi-square distances that measure the profiles
% of a set of points between row and columns.
%
% One need the tabular data in a contingency table, that is considered a
% primitive matrix. The correspondence matrix P is defined as the original
% table divided by the grand total n. The profiles uses the marginal row and
% column frequencies. Thus, the row and column profiles are conditional
% densities, and they give the row and column mass. The distances are a variant
% of Euclidian distances, called weighted Euclidian distance. In a symmetric
% fashion the distances are chi-squares. The inertia is taken from the
% moment in mechanics. There is a cloud of profile points with masses adding
% up to 1. These points have a centroid (average profile) and a distance
% (chi-square distance) between profile points. Each profile point
% contributes to the inertia of the whole cloud.
%
% The data in a contingency table can be used to check for association of
% two categorical variables as a test of independence by an approximately
% (asymptotically) distributed chi-square random variable with (a-1)(b-1)
% degrees of freedom (a and b = categories of variable 1 and 2,
% respectively). The overall association is quantified by the chi-squared
% statistic divided by the grand total, called the total inertia. If there
% is independence, we would expect the rows or columns of the contingency
% table to have similar profiles. The chi-square can be expressed in vector
% and matrix terms as the nonzero eigenvalues, with rank k = min[(a-1),(b-1)],
% clearly less than min(a,b).
%
% The row and column coordinates, with respect to their respective principal
% axes, may be obtained from the singular value decomposition (SVD) of the
% corresponence matrix, transformed by double-centering and standardizing. The
% squares of the singular values are the principal inertias or eigenvalues.
% A singular value is a canonical correlation.
%
% The rows will be the points projected in a map interpreted in terms of the
% columns as reference points. Row profiles, will be represented by principal
% coordinates, and will be expressed with respect to the column vertices. In
% the same way the colums are interpreted in terms of the rows.
%
% Syntax: function corran(X)
%
% Input:
% X - Data matrix=contingence table. Size a-categorical variable 1 x
% b-categorical variable 2.
% Outputs:
% Complete Correspondence Analysis
% Pair-wise Dimensions Plots. For the vertical and horizonal lines
% we use the hline.m and vline.m files kindly published on FEX
% by Brandon Kuczenski [http://www.mathworks.com/matlabcentral/
% fileexchange/loadFile.do?objectId=1039]
%
% Example: From the example 15.2.2 (Rencher, 2002) we have the number of
% piston ring failures in each three legs in four compressors found in the
% same building (all four compessors are oriented in the same direction)
% and we are interested to make a Corresponence Analysis. Data (piston
% ring failures) are:
% Leg
% ----------------------------
% Compressor Noth Center South
% ----------------------------------------
% 1 17 17 12
% 2 11 9 13
% 3 11 8 19
% 4 14 7 28
% ----------------------------------------
%
% Data matrix must be:
% X=[17 17 12;11 9 13;11 8 19;14 7 28];
%
% Calling on Matlab the function:
% corran(X)
% Answer is:
%
% Inertia and chi-square decomposition of the Correspondence Analysis.
% ------------------------------------------------------------------------
% Cummulative
% Singular Value Inertia Chi-square Percent Percent
% ------------------------------------------------------------------------
% 0.2653 0.0704 11.6819 99.66 99.66
% 0.0156 0.0002 0.0404 0.34 100.00
% ------------------------------------------------------------------------
% Total 0.0706 11.7223 100.00
% Categorical variable 1 = 4 levels; Categorical variable 2 = 3 levels
% P-value = 0.068459; degrees of freedom = 6
% ------------------------------------------------------------------------
%
% Created by A. Trujillo-Ortiz, R. Hernandez-Walls and K. Barba-Rojo
% Facultad de Ciencias Marinas
% Universidad Autonoma de Baja California
% Apdo. Postal 453
% Ensenada, Baja California
% Mexico.
% atrujo@uabc.mx
% Copyright. September 16, 2008.
%
% To cite this file, this would be an appropriate format:
% Trujillo-Ortiz, A., R. Hernandez-Walls and K. Barba-Rojo. (2008). corran:
% Correspondence Analysis. A MATLAB file. [WWW document]. URL http://
% www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=21505
%
% Reference:
% Rencher, A. C. (2000), Methods of Multivariate Analysis. 2nd ed.
% John Wiley & Sons. p. 514-525
%
[a,b] = size(X);
N = sum(sum(X)); %grand total
P = X/N; %correspondence matrix
r = sum(P,2); %row total
c = sum(P,1); %column total
Dr = diag(r); %matrix diagonal of r
Dc = diag(c); %matrix diagonal of c
R = inv(Dr)*P; %row profiles
C = P*inv(Dc); %column profiles
Z = sqrt(inv(Dr))*(P-r*c)*sqrt(inv(Dc));
[U,S,V] = svd(Z); %singular value decomposition
k = rank(inv(Dr)*(P-r*c)*inv(Dc)*(P-r*c)'); %maximum number of dimensions (=rank(P-r*c))
SV = diag(S); %singular values
SV = SV(1:k); %singular value equal to zero is deleted
EV = SV.^2; %eigenvalues (inertia)
PI = EV./sum(EV)*100; %percent of inertia
PA = cumsum(PI); %cumulative percent of inertia
x2 = N*EV; %chi-squared
A = sqrt(Dr)*U;
B = sqrt(Dc)*V;
X2 = N*trace(inv(Dr)*(P-r*c)*inv(Dc)*(P-r*c)'); %total chi-squared
v = (a-1)*(b-1); %degrees of freedom
P = 1-chi2cdf(X2,v); %P-value
disp(' ')
disp('Inertia and chi-square decomposition of the Correspondence Analysis.')
fprintf('------------------------------------------------------------------------\n');
disp(' Cummulative ');
disp(' Singular Value Inertia Chi-square Percent Percent ');
fprintf('------------------------------------------------------------------------\n');
fprintf(' %10.4f %10.4f %10.4f %10.2f %10.2f\n',[SV,EV,x2,PI,PA].');
fprintf('------------------------------------------------------------------------\n');
fprintf(' Total %14.4f %13.4f %13.2f\n', sum(EV),N*sum(EV),sum(PI));
disp(['Categorical variable 1 = ',num2str(a) ' levels; ' 'Categorical variable 2 = ',num2str(b) ' levels']);
disp([' P-value = ',num2str(P) '; ' 'degrees of freedom = ',num2str(v) '']);
fprintf('------------------------------------------------------------------------\n');
disp(' ');
Dm = [];
for i = 1:k
for s = i+1:k
if s ~= i
d = [i s];
Dm = [Dm;d];
end
end
end
co = (k*(k - 1))/2;
gf = input('Are you interested to get the dimensions score plots? (y/n): ','s');
if gf == 'y'
disp('--------------');
fprintf('The pair-wise plots you can get are: %.i\n', co);
disp('--------------');
Dm
fprintf('--------------\n');
d = 'y';
while d == 'y';
g = input('Give me the interested dimensions to plot. Please, use [a b]:');
figure;
X=inv(Dr)*A*S; %row coordinates
X=X(:,1:k);
Y=inv(Dc)*B*S'; %column coordinates
Y=Y(:,1:k);
l1=[1:size(X,1)];
l2=[1:size(Y,1)];
labels1 = strread(sprintf('%d ',l1),'%s').';
labels2 = strread(sprintf('%d ',l2),'%s').';
plot(X(:,g(1)),X(:,g(2)),'*b',Y(:,g(1)),Y(:,g(2)),'*r',0,0,'-k');
h = legend('Levels of categorical variable 1','Levels of categorical variable 2',2);
text(X(:,g(1))+.002,X(:,g(2))+.002,labels1,'Color',[0 0 1]);
text(Y(:,g(1))+.002,Y(:,g(2))+.002,labels2,'Color',[1 0 0]);
title('Plot of points of the performed Correspondence Analysis.');
disp(' ')
disp('Note: At the end of the program execution. If you are interested to fit the point labels')
disp('on the generated figures you can turn-on active button ''Edit Plot'', do click on the')
disp('selected label and drag to fix it on the desired position. Then turn-off active ''Edit Plot''.')
disp(' ')
xlabel(['Dimension ' num2str(g(1))';]);
ylabel(['Dimension ' num2str(g(2))';]);
hline(0,'k'); %this m-file was taken from Brandon Kuczenski''s hline and vline zip
vline(0,'k'); %this m-file was taken from Brandon Kuczenski''s hline and vline zip
d=input('Do you need another plot? (y/n):','s');
end
else
end
return,