How to apply PCA to training/val/test data
7 views (last 30 days)
Show older comments
Hello. I am working on a neural network that predicts anxiety using heart rate and heart rate variation (supervised machine learning). I have 232 samples and a total of 17 features. In the future, I want to be able to add samples and use this NN to predict anxiety for those. Obviously that is too many features, so I want to do feature reduction using PCA, but as much as I read, I am still a little confused.
I applied PCA to the dataset and 94% of the variation is in 8 components, but that was for the whole dataset. I know that I need to calculate PCA for the training and then do a thing for the validation and test data but I am so confused as to what that thing is and how to do it. Then redo that thing for new samples in the future. If someone could spoonfeed me an answer I would be so happy.
3 Comments
Simon Schmid
on 5 Feb 2018
Edited: Simon Schmid
on 5 Feb 2018
I think you need to use the same PCA model, which determined with your training data on the test and cross validation data.
Lets say you used following code:
[pcs,scrs,~,~,explained,mu] = pca(data);
%here you can use your first 8 components in scrs (with regard to the %explained variance in explained)
%Then use the model for your training and cross val. data (datatrain)
%and take here again the first 8 components
datatrain1=(datatrain-mu)/pcs';
% Can anybody confirm this code??
Answers (0)
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!