How to Apply PCA Corrrect way

I have data 322*91 .
322*88 is my input , my features. 322*3 my outputs, my targets.
My neural network result is not good because 88 features is more according to 322 datas.
Data should be more to get good accuracy.
But I have no chance to increase it.
So I want to apply PCA to decrease 88 features but I couldn't manage to apply it in correct way.
How can I do that?
When I write the code newinput=pca(input)
it decreases row number and gives 88*88 .
I need to keep row number (322) and decrease only 88 numbers.
Codes are below for neural network
And data is attached.
veri=xlsread('data322.xlsx');
input=veri(:,1:88);
target=veri(:,89:91);
x=input';
t=target';
% Solve a Pattern Recognition Problem with a Neural Network
% Script generated by Neural Pattern Recognition app
% Created 07-Feb-2021 15:50:44
%
% This script assumes these variables are defined:
%
% x - input data.
% t - target data.
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainscg'; % Scaled conjugate gradient backpropagation.
% Create a Pattern Recognition Network
hiddenLayerSize = [5 4 3];
net = patternnet(hiddenLayerSize, trainFcn);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t);
% Test the Network
y = net(x);
e = gsubtract(t,y);
performance = perform(net,t,y)
tind = vec2ind(t);
yind = vec2ind(y);
percentErrors = sum(tind ~= yind)/numel(tind);
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotconfusion(t,y)
%figure, plotroc(t,y)

 Accepted Answer

the cyclist
the cyclist on 12 Apr 2021
Your question, especially when you ask about the 88x88 matrix that is output from pca(), indicates that you don't really understand the output. (The output is not the new variables.)
I have written a very extensive explanation of PCA, in response to this question. If you thoroughly understand that answer, you should be able to solve your problem.

6 Comments

I read and apply but I couldn't manage reduction . And I used 'score' as my new data. But my neural network accuracy decreased. Sorry but I couldn't to do this.
Codes are below . I tried this way and something goes wrong . How will I do reduction and how to apply it to codes.
clear all;
clc;
X=xlsread('C:\Users\Ali\Desktop\data322.xlsx');
X = X - mean(X);
[coeff,score,latent,~,explained] = pca(X);
covarianceMatrix = cov(X);
[V,D] = eig(covarianceMatrix);
dataInPrincipalComponentSpace = X*coeff;
score;
input=score(:,1:88);
target=score(:,89:91);
x=input';
t=target';
% Solve a Pattern Recognition Problem with a Neural Network
% Script generated by Neural Pattern Recognition app
% Created 07-Feb-2021 15:50:44
%
% This script assumes these variables are defined:
%
% x - input data.
% t - target data.
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainscg'; % Scaled conjugate gradient backpropagation.
% Create a Pattern Recognition Network
hiddenLayerSize = [5 4 3];
net = patternnet(hiddenLayerSize, trainFcn);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t);
% Test the Network
y = net(x);
e = gsubtract(t,y);
performance = perform(net,t,y)
tind = vec2ind(t);
yind = vec2ind(y);
percentErrors = sum(tind ~= yind)/numel(tind);
% View the Network
view(net)
The main problem I see is that you should not be including your output variables in the PCA. The resulting transformation jumbles all the inputs and outputs, and is meaningless. No wonder your performance went down. Instead, segregate the input variables first, run the PCA on only the inputs. So, I think your corrected code would be something like
% Get the data, and de-mean
X=xlsread('C:\Users\Ali\Desktop\data322.xlsx');
X = X - mean(X);
% Segregate the inputs and targets
input = X(:,1:88);
target = X(:,89:92);
% PCA on inputs
[coeff,score,latent,~,explained] = pca(input);
inputInPCSpace = input * coeff;
% Transpose and rename as required by the NN
x = inputInPCSpace';
t = target';
Note that the inputs are now transformed, but the outputs are not. It would be good if you tried to understand why.
Also, be aware that the way you wrote your code, you have not done any dimensional reduction. You are stlil using the entire 88-dimensional input feature space. You have only performed a linear transformation on them. I would expect your NN to perform about the same.
To reduce the dimensionality, you would need an additional step such as
inputInPCSpaceReduced = inputInPCSpace(:,1:10); % Select only the top 10 features
and then define your x from that. I am not very knowledgeable about neural nets, and I am not sure if it is best practice to perform PCA before using a NN, so it is unclear to me if you should expect your NN to perform better by using PCA.
Thank u @the cyclist it worked but NN performance increased 2% . I didn't get enough increment using PCA . But at least 2% increment I got. This is also something good . Thank u again . When I use
X = X - mean(X);
my accuracy decreased. so I didn't use this step. This step is necessary for using PCA ?
@Greg Heath Do you know how to increase accuracy according to this data ? I tried PCA and it increased 2% more.
You do not need the de-meaning step. MATLAB internally de-means inside of PCA.
I'm surprised it has any impact on the accuracy of your NN, though.
@Ali Zulfikaroglu, responding to your email question here.
Unfortunately, I am not very knowledgeable about artificial neural networks, and can provide no advice on improving your accuracy.
I will say that just wanting your accuracy to be higher is very different from having a system that is able to be accurately predicted. If there is variation in the output that is not explained by your inputs (i.e. noise), then even the optimal model is limited in its accuracy.

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!