Process columns of matrix with principal component analysis
[Y,PS] = processpca(X,maxfrac)
[Y,PS] = processpca(X,FP)
Y = processpca('apply',X,PS)
X = processpca('reverse',Y,PS)
name = processpca('name')
fp = processpca('pdefaults')
names = processpca('pdesc')
processpca('pcheck',fp);
processpca
processes matrices using principal
component analysis so that each row is uncorrelated, the rows are
in the order of the amount they contribute to total variation, and
rows whose contribution to total variation are less than maxfrac
are
removed.
[Y,PS] = processpca(X,maxfrac)
takes X
and
an optional parameter,
X 

maxfrac  Maximum fraction of variance for removed rows (default is 0) 
and returns
Y 

PS  Process settings that allow consistent processing of values 
[Y,PS] = processpca(X,FP)
takes parameters
as a struct: FP.maxfrac
.
Y = processpca('apply',X,PS)
returns Y
,
given X
and settings PS
.
X = processpca('reverse',Y,PS)
returns X
,
given Y
and settings PS
.
name = processpca('name')
returns the name
of this process method.
fp = processpca('pdefaults')
returns default
process parameter structure.
names = processpca('pdesc')
returns the
process parameter descriptions.
processpca('pcheck',fp);
throws an error
if any parameter is illegal.
Here is how to format a matrix with an independent row, a correlated row, and a completely redundant row so that its rows are uncorrelated and the redundant row is dropped.
x1_independent = rand(1,5) x1_correlated = rand(1,5) + x_independent; x1_redundant = x_independent + x_correlated x1 = [x1_independent; x1_correlated; x1_redundant] [y1,ps] = processpca(x1)
Next, apply the same processing settings to new values.
x2_independent = rand(1,5) x2_correlated = rand(1,5) + x_independent; x2_redundant = x_independent + x_correlated x2 = [x2_independent; x2_correlated; x2_redundant]; y2 = processpca('apply',x2,ps)
Reverse the processing of y1
to get x1
again.
x1_again = processpca('reverse',y1,ps)
In some situations, the dimension of the input vector is large,
but the components of the vectors are highly correlated (redundant).
It is useful in this situation to reduce the dimension of the input vectors. An effective procedure for performing this
operation is principal component analysis. This technique has three
effects: it orthogonalizes the components of the input vectors (so
that they are uncorrelated with each other), it orders the resulting
orthogonal components (principal components) so that those with the
largest variation come first, and it eliminates those components that
contribute the least to the variation in the data set. The following
code illustrates the use of processpca
, which performs
a principalcomponent analysis using the processing setting maxfrac
of 0.02
.
[pn,ps1] = mapstd(p); [ptrans,ps2] = processpca(pn,0.02);
The input vectors are first normalized, using mapstd
,
so that they have zero mean and unity variance. This is a standard
procedure when using principal components. In this example, the second
argument passed to processpca
is 0.02. This means
that processpca
eliminates those principal components
that contribute less than 2% to the total variation in the data set.
The matrix ptrans
contains the transformed input
vectors. The settings structure ps2
contains the
principal component transformation matrix. After the network has been
trained, these settings should be used to transform any future inputs
that are applied to the network. It effectively becomes a part of
the network, just like the network weights and biases. If you multiply
the normalized input vectors pn
by the transformation
matrix transMat
, you obtain the transformed input
vectors ptrans
.
If processpca
is used to preprocess the training
set data, then whenever the trained network is used with new inputs,
you should preprocess them with the transformation matrix that was
computed for the training set, using ps2
. The following
code applies a new set of inputs to a network already trained.
pnewn = mapstd('apply',pnew,ps1); pnewtrans = processpca('apply',pnewn,ps2); a = sim(net,pnewtrans);
Principal component analysis is not reliably reversible. Therefore it is only recommended for input processing. Outputs require reversible processing functions.
Principal component analysis is not part of the default processing
for feedforwardnet
. You can add this with the following
command:
net.inputs{1}.processFcns{end+1} = 'processpca';