canoncorr
Canonical correlation
Syntax
Description
Examples
Compute Sample Canonical Correlation
Perform canonical correlation analysis for a sample data set.
The data set carbig
contains measurements for 406 cars from the years 1970 to 1982.
Load the sample data.
load carbig;
data = [Displacement Horsepower Weight Acceleration MPG];
Define X as the matrix of displacement, horsepower, and weight observations, and Y
as the matrix of acceleration and MPG observations. Omit rows with insufficient data.
nans = sum(isnan(data),2) > 0; X = data(~nans,1:3); Y = data(~nans,4:5);
Compute the sample canonical correlation.
[A,B,r,U,V] = canoncorr(X,Y);
View the output of A
to determine the linear combinations of displacement, horsepower, and weight that make up the canonical variables of X
.
A
A = 3×2
0.0025 0.0048
0.0202 0.0409
0.0000 0.0027
A(3,1)
is displayed as —0.000
because it is very small. Display A(3,1)
separately.
A(3,1)
ans = 2.4737e05
The first canonical variable of X
is u1 = 0.0025*Disp + 0.0202*HP — 0.000025*Wgt
.
The second canonical variable of X
is u2 = 0.0048*Disp + 0.0409*HP — 0.0027*Wgt
.
View the output of B to determine the linear combinations of acceleration and MPG that make up the canonical variables of Y
.
B
B = 2×2
0.1666 0.3637
0.0916 0.1078
The first canonical variable of Y
is v1 =
—
0.1666*Accel — 0.0916*MPG
.
The second canonical variable of Y
is v2 = —0.3637*Accel + 0.1078*MPG
.
Plot the scores of the canonical variables of X
and Y
against each other.
t = tiledlayout(2,2); title(t,'Canonical Scores of X vs Canonical Scores of Y') xlabel(t,'Canonical Variables of X') ylabel(t,'Canonical Variables of Y') t.TileSpacing = 'compact'; nexttile plot(U(:,1),V(:,1),'.') xlabel('u1') ylabel('v1') nexttile plot(U(:,2),V(:,1),'.') xlabel('u2') ylabel('v1') nexttile plot(U(:,1),V(:,2),'.') xlabel('u1') ylabel('v2') nexttile plot(U(:,2),V(:,2),'.') xlabel('u2') ylabel('v2')
The pairs of canonical variables $$\{{u}_{i},{v}_{i}\}$$ are ordered from the strongest to weakest correlation, with all other pairs independent.
Return the correlation coefficient of the variables u1
and v1
.
r(1)
ans = 0.8782
Input Arguments
X
— Input matrix
matrix
Input matrix, specified as an
nbyd_{1} matrix. The
rows of X
correspond to observations, and the columns correspond to
variables.
Data Types: single
 double
Y
— Input matrix
matrix
Input matrix, specified as an
nbyd_{2} matrix where
X
is an
nbyd_{1} matrix. The
rows of Y
correspond to observations, and the columns correspond to
variables.
Data Types: single
 double
Output Arguments
A
— Sample canonical coefficients for X variables
matrix
Sample canonical coefficients for the variables in X
, returned
as a d_{1}byd matrix, where d =
min(rank(X),rank(Y)).
The jth column of A
contains the linear
combination of variables that makes up the jth canonical variable for
X
.
If X
is less than full rank, canoncorr
gives a warning and returns zeros in the rows of A
corresponding to
dependent columns of X
.
B
— Sample canonical coefficients for Y variables
matrix
Sample canonical coefficients for the variables in Y
, returned
as a d_{2}byd matrix, where d =
min(rank(X),rank(Y)).
The jth column of B
contains the linear
combination of variables that makes up the jth canonical variable for
Y
.
If Y
is less than full rank, canoncorr
gives a warning and returns zeros in the rows of B
corresponding to
dependent columns of Y
.
U
— Canonical scores for the X variables
matrix
Canonical scores for the variables in X
, returned as an
nbyd matrix, where X
is
an nbyd_{1} matrix and d =
min(rank(X),rank(Y)).
V
— Canonical scores for the Y variables
matrix
Canonical scores for the variables in Y
, returned as an
nbyd matrix, where Y
is
an nbyd_{2} matrix and d =
min(rank(X),rank(Y)).
stats
— Hypothesis test information
structure
Hypothesis test information, returned as a structure. This information relates to the sequence of d null hypotheses $${H}_{0}^{(k)}$$ that the (k+1)st through dth correlations are all zero for k=1,…,d1, and d = min(rank(X),rank(Y)).
The fields of stats
are
1byd vectors with elements corresponding to
the values of k.
Field  Description 

Wilks  Wilks' lambda (likelihood ratio) statistic 
df1  Degrees of freedom for the chisquared statistic, and the numerator degrees of freedom for the F statistic 
df2  Denominator degrees of freedom for the F statistic 
F  Rao's approximate F statistic for $${H}_{0}^{(k)}$$ 
pF  Righttail significance level for 
chisq  Bartlett's approximate chisquared statistic for $${H}_{0}^{(k)}$$ with Lawley's modification 
pChisq  Righttail significance level for

stats
has two other fields (dfe
and
p
), which are equal to df1
and
pChisq
, respectively, and exist for historical reasons.
Data Types: struct
More About
Canonical Correlation Analysis
The canonical scores of the data matrices X and Y are defined as
$$\begin{array}{c}{U}_{i}=X{a}_{i}\\ {V}_{i}=Y{b}_{i}\end{array}$$
where a_{i} and b_{i} maximize the Pearson correlation coefficient ρ(U_{i},V_{i}) subject to being uncorrelated to all previous canonical scores and scaled so that U_{i} and V_{i} have zero mean and unit variance.
The canonical coefficients of X and Y are the matrices A and B with columns a_{i} and b_{i}, respectively.
The canonical variables of X and Y are the linear combinations of the columns of X and Y given by the canonical coefficients in A and B respectively.
The canonical correlations are the values ρ(U_{i},V_{i}) measuring the correlation of each pair of canonical variables of X and Y.
Algorithms
canoncorr
computes A
, B
,
and r
using qr
and svd
. canoncorr
computes U
and
V
as U = (X—mean(X))*A
and V =
(Y—mean(Y))*B
.
References
[1] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988.
[2] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984.
Version History
Introduced before R2006a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)