Monte Carlo repetitions with customized partitions

66 views (last 30 days)
k = 5; %number of partitions
c = cvpartition(Labels{2},"KFold", k ,"Stratify",true);
test_idx = test(c,"all");
for ii = 1:5
%%%% Divide into train and test set via logical indexing (5columns = 5
%%%% partitions. Label 1 & 3 are always used for testing
testIndices(:,ii) = logical([ones(numel(Labels{1}),1); test_idx(:,ii); ones(numel(Labels{3}),1)]);
end
c = cvpartition("CustomPartition",testIndices);
I want to customize partitions for cross-validation, but with some of the samples to be tested for in each partition. Is there a way to do it?
I tried using cvpartition, but I can either customize the partitions and get the Error: "Each observation must be present in one test set."
Or I use monte carlo repetitions which allows for samples to be used more than once as testing set, but then I cant customize the sets anymore.
I'm thankful for any hint.
  9 Comments
Tobias Rieker
Tobias Rieker on 3 Apr 2024 at 13:48
Indeed SFS_xAlwaysIn has more columns than XTest. The reason is that Sequentialfs chooses 1,2,3...n columns (features) of the data to be tested against the testing data. This choosing of one of the features doesnt happen with the concatenated testing data SFS_xAlwaysIn though as it is "externally added". Is there a way to automatically chose the same features for SFS_xAlwaysIn?
Thank you for your big help so far. It is highly appreciated
Harald
Harald on 3 Apr 2024 at 15:21
Duh... that makes sense. I suppose it will take some fiddling to address this.
I would try this strategy:
  • To be able to determine which columns were chosen, add fake data 1:numColumns to on top of the x-values and some nonsense value that does not appear in your y-values on top of the y-values that you supply to sequentialfs.
  • Identify which of the y-values passed to the function (either yTrain or yTest) contains the nonsense value. Extract the corresponding row of x-values from xTrain or xTest. This will tell you which columns were sent into the function.
  • Extract the corresponding columns from SFS_xAlwaysIn and add it to the test data. Be sure to remove the fake data of the first step.
I expect this to be somewhat tricky and would be happy to try to help, but would really need some sample data for SFS_xtrain and SFS_ytrain to play with. Perhaps I should be able to infer this, but I am not even sure of the data type of SFS_ytrain.
Best wishes,
Harald

Sign in to comment.

Accepted Answer

Harald
Harald on 4 Apr 2024 at 9:13
I have now tried the approach discussed in the comments with sample data based on fisheriris.mat.
%% Sample data
load fisheriris.mat
species = categorical(species);
% Shuffle data
order = randperm(length(species));
meas = meas(order,:);
species = species(order,:);
SFS_xtrain = meas(1:130,:);
SFS_ytrain = species(1:130);
SFS_xAlwaysIn = meas(131:end,:);
SFS_yAlwaysIn = species(131:end);
%% Add fake data
SFS_xtrain = [1:size(SFS_xtrain, 2); SFS_xtrain];
SFS_ytrain = ["nonsense"; SFS_ytrain];
%% Your code (for now without setting "nfeatures" and "options")
k = 5;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
% opts = statset("UseParallel",true);
fun = @(XTrain,yTrain,XTest,yTest) callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c);
%% A helper function
function err = callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn)
if sum(yTrain == "nonsense") == 1
idx = yTrain == "nonsense";
columns = XTrain(idx, :);
XTrain(idx,:) = [];
yTrain(idx) = [];
elseif sum(yTest == "nonsense") == 1
idx = yTest == "nonsense";
columns = XTest(idx, :);
XTest(idx,:) = [];
yTest(idx) = [];
else
error("Something unexpected happened. Revisit the approach...")
end
XTrain = [XTrain; SFS_xAlwaysIn(:, columns)];
yTrain = [yTrain; SFS_yAlwaysIn];
err = errorFun(XTrain,yTrain,XTest,yTest);
end
%% Your function
function error = errorFun(XTrain,yTrain,XTest,yTest)
% Create the model with the learning method of your choice
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end
I hope you'll find this to be helpful.
Best wishes,
Harald
  2 Comments
Tobias Rieker
Tobias Rieker on 4 Apr 2024 at 10:05
Edited: Tobias Rieker on 4 Apr 2024 at 10:13
Thanks to your above meantioned idea I have now figured it out. Thank you!
This is my approach:
% add fake data ontop of columns
num_col = 1:numel(EMG_chanels_remaining); %EMG_chanels_remaining = number of features
SFS_xtrain = [num_col;SFS_xtrain];
SFS_ytrain = [categorical(1);SFS_ytrain];
fun = @(XTrain,yTrain,XTest,yTest) errorFun(XTrain,yTrain, XTest, SFS_xAlwaysIn, yTest,SFS_yAlwaysIn); %%%% FOR CV
% CVpartition Object
k = 10;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c,"nfeatures",nfeatures,"options",opts);
function error = errorFun(XTrain,yTrain, XTest, SFS_xAlwaysIn, yTest,SFS_yAlwaysIn)
%Find where fake data is & extract columns of first row = the features
%included in SFS
if ismember(yTrain(1,:),categorical(1:256))
Ch_count = XTrain(1,:);
XTrain(1,:) = [];
yTrain(1,:) = [];
else
Ch_count = XTest(1,:);
XTest(1,:) = [];
yTest(1,:) = [];
end
%add data only to be tested with according columns(features)
XTrain = [XTrain;SFS_xAlwaysIn(:,Ch_count)];
yTrain = [yTrain;SFS_yAlwaysIn];
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end
Harald
Harald on 4 Apr 2024 at 11:07
Glad it's working for you! If you found the answer to be helpful, please consider "accept"-ing it.
Best wishes,
Harald

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!