How to Apply PCA Corrrect way

Question

Open in MATLAB Online

0 votes

data322.xlsx

I have data 322*91 .

322*88 is my input , my features. 322*3 my outputs, my targets.

My neural network result is not good because 88 features is more according to 322 datas.

Data should be more to get good accuracy.

But I have no chance to increase it.

So I want to apply PCA to decrease 88 features but I couldn't manage to apply it in correct way.

How can I do that?

When I write the code newinput=pca(input)

it decreases row number and gives 88*88 .

I need to keep row number (322) and decrease only 88 numbers.

Codes are below for neural network

And data is attached.

veri=xlsread('data322.xlsx');
input=veri(:,1:88);
target=veri(:,89:91);
x=input';
t=target';
% Solve a Pattern Recognition Problem with a Neural Network
% Script generated by Neural Pattern Recognition app
% Created 07-Feb-2021 15:50:44
%
% This script assumes these variables are defined:
%
%   x - input data.
%   t - target data.
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainscg';  % Scaled conjugate gradient backpropagation.
% Create a Pattern Recognition Network
hiddenLayerSize = [5 4 3];
net = patternnet(hiddenLayerSize, trainFcn);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t);
% Test the Network
 y = net(x);
 e = gsubtract(t,y);
 performance = perform(net,t,y)
 tind = vec2ind(t);
 yind = vec2ind(y);
 percentErrors = sum(tind ~= yind)/numel(tind);
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotconfusion(t,y)
%figure, plotroc(t,y)

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

the cyclist on 12 Apr 2021

0 votes

Your question, especially when you ask about the 88x88 matrix that is output from pca(), indicates that you don't really understand the output. (The output is not the new variables.)

I have written a very extensive explanation of PCA, in response to this question. If you thoroughly understand that answer, you should be able to solve your problem.

6 Comments
Show 4 older comments Hide 4 older comments

Ali Zulfikaroglu on 13 Apr 2021

Open in MATLAB Online

I read and apply but I couldn't manage reduction . And I used 'score' as my new data. But my neural network accuracy decreased. Sorry but I couldn't to do this.

Codes are below . I tried this way and something goes wrong . How will I do reduction and how to apply it to codes.

clear all;
clc;
X=xlsread('C:\Users\Ali\Desktop\data322.xlsx');
X = X - mean(X);
[coeff,score,latent,~,explained] = pca(X);
covarianceMatrix = cov(X);
[V,D] = eig(covarianceMatrix);
dataInPrincipalComponentSpace = X*coeff;
score;
input=score(:,1:88);
target=score(:,89:91);
x=input';
t=target';
% Solve a Pattern Recognition Problem with a Neural Network
% Script generated by Neural Pattern Recognition app
% Created 07-Feb-2021 15:50:44
%
% This script assumes these variables are defined:
%
%   x - input data.
%   t - target data.
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainscg';  % Scaled conjugate gradient backpropagation.
% Create a Pattern Recognition Network
hiddenLayerSize = [5 4 3];
net = patternnet(hiddenLayerSize, trainFcn);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t);
% Test the Network
 y = net(x);
 e = gsubtract(t,y);
 performance = perform(net,t,y)
 tind = vec2ind(t);
 yind = vec2ind(y);
 percentErrors = sum(tind ~= yind)/numel(tind);
% View the Network
view(net)

the cyclist on 13 Apr 2021

Open in MATLAB Online

The main problem I see is that you should not be including your output variables in the PCA. The resulting transformation jumbles all the inputs and outputs, and is meaningless. No wonder your performance went down. Instead, segregate the input variables first, run the PCA on only the inputs. So, I think your corrected code would be something like

% Get the data, and de-mean
X=xlsread('C:\Users\Ali\Desktop\data322.xlsx');
X = X - mean(X);
% Segregate the inputs and targets
input = X(:,1:88);
target = X(:,89:92);
% PCA on inputs
[coeff,score,latent,~,explained] = pca(input);
inputInPCSpace = input * coeff;
% Transpose and rename as required by the NN
x = inputInPCSpace';
t = target';

Note that the inputs are now transformed, but the outputs are not. It would be good if you tried to understand why.

Also, be aware that the way you wrote your code, you have not done any dimensional reduction. You are stlil using the entire 88-dimensional input feature space. You have only performed a linear transformation on them. I would expect your NN to perform about the same.

To reduce the dimensionality, you would need an additional step such as

inputInPCSpaceReduced = inputInPCSpace(:,1:10); % Select only the top 10 features

and then define your x from that. I am not very knowledgeable about neural nets, and I am not sure if it is best practice to perform PCA before using a NN, so it is unclear to me if you should expect your NN to perform better by using PCA.

the cyclist on 14 Apr 2021

You do not need the de-meaning step. MATLAB internally de-means inside of PCA.

I'm surprised it has any impact on the accuracy of your NN, though.

the cyclist on 14 Apr 2021

@Ali Zulfikaroglu, responding to your email question here.

Unfortunately, I am not very knowledgeable about artificial neural networks, and can provide no advice on improving your accuracy.

I will say that just wanting your accuracy to be higher is very different from having a system that is able to be accurately predicted. If there is variation in the output that is not explained by your inputs (i.e. noise), then even the optimal model is limited in its accuracy.

Sign in to comment.

How to Apply PCA Corrrect way

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

How to Apply PCA Corrrect way

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

6 Comments
Show 4 older comments Hide 4 older comments