Monte Carlo repetitions with customized partitions

Question

Tobias Rieker on 1 Apr 2024 at 5:47

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/2101181-monte-carlo-repetitions-with-customized-partitions

Commented: Harald on 4 Apr 2024 at 11:07

k = 5; %number of partitions
c = cvpartition(Labels{2},"KFold", k ,"Stratify",true);
test_idx = test(c,"all");
for ii = 1:5
    %%%% Divide into train and test set via logical indexing (5columns = 5
    %%%% partitions. Label 1 & 3 are always used for testing
    testIndices(:,ii) = logical([ones(numel(Labels{1}),1); test_idx(:,ii); ones(numel(Labels{3}),1)]);
end
c = cvpartition("CustomPartition",testIndices);

I want to customize partitions for cross-validation, but with some of the samples to be tested for in each partition. Is there a way to do it?

I tried using cvpartition, but I can either customize the partitions and get the Error: "Each observation must be present in one test set."

Or I use monte carlo repetitions which allows for samples to be used more than once as testing set, but then I cant customize the sets anymore.

I'm thankful for any hint.

9 Comments
Show 7 older commentsHide 7 older comments

Tobias Rieker on 3 Apr 2024 at 11:21

Edited: Tobias Rieker on 3 Apr 2024 at 11:24

Open in MATLAB Online

I tried it out, but still doesnt work:

SFS_xtrain is the data to be partioned (KFold = 5) Then every fold the data to be only tested is concatenated (=SFS_xAlwaysIn...)

I have no idea why the arrays are not consistent.

k = 5;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
opts = statset("UseParallel",true);
fun = @(XTrain,yTrain,XTest,yTest) errorFun(XTrain,yTrain, [XTest; SFS_xAlwaysIn], [yTest; SFS_yAlwaysIn]); %%%% FOR CV
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c,"nfeatures",nfeatures,"options",opts);
function error = errorFun(XTrain,yTrain,XTest,yTest)
% Create the model with the learning method of your choice
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end
______
Error using crossval>evalFun
The function '@(XTrain,yTrain,XTest,yTest)errorFun(XTrain,yTrain,[XTest;SFS_xAlwaysIn],[yTest;SFS_yAlwaysIn])' generated
the following error:
Dimensions of arrays being concatenated are not consistent.
Error in crossval>getFuncVal (line 509)
funResult = evalFun(funorStr,arg(:));
Error in crossval (line 355)
funResult = getFuncVal(1, nData, cvp, data, funorStr, []);
Error in sequentialfs>callfun (line 500)
funResult = crossval(fun,x,other_data{:},...
    
Error in sequentialfs (line 368)
crit(k) = callfun(fun,x,other_data,cv,mcreps,ParOptions);

Tobias Rieker on 3 Apr 2024 at 13:48

Indeed SFS_xAlwaysIn has more columns than XTest. The reason is that Sequentialfs chooses 1,2,3...n columns (features) of the data to be tested against the testing data. This choosing of one of the features doesnt happen with the concatenated testing data SFS_xAlwaysIn though as it is "externally added". Is there a way to automatically chose the same features for SFS_xAlwaysIn?

Thank you for your big help so far. It is highly appreciated

Harald on 3 Apr 2024 at 15:21

Duh... that makes sense. I suppose it will take some fiddling to address this.

I would try this strategy:

To be able to determine which columns were chosen, add fake data 1:numColumns to on top of the x-values and some nonsense value that does not appear in your y-values on top of the y-values that you supply to sequentialfs.
Identify which of the y-values passed to the function (either yTrain or yTest) contains the nonsense value. Extract the corresponding row of x-values from xTrain or xTest. This will tell you which columns were sent into the function.
Extract the corresponding columns from SFS_xAlwaysIn and add it to the test data. Be sure to remove the fake data of the first step.

I expect this to be somewhat tricky and would be happy to try to help, but would really need some sample data for SFS_xtrain and SFS_ytrain to play with. Perhaps I should be able to infer this, but I am not even sure of the data type of SFS_ytrain.

Best wishes,

Harald

Sign in to comment.

Sign in to answer this question.

Answer 1

Harald on 4 Apr 2024 at 9:13

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2101181-monte-carlo-repetitions-with-customized-partitions#answer_1436381

Open in MATLAB Online

I have now tried the approach discussed in the comments with sample data based on fisheriris.mat.

%% Sample data
load fisheriris.mat
species = categorical(species);
% Shuffle data
order = randperm(length(species));
meas = meas(order,:);
species = species(order,:);
SFS_xtrain = meas(1:130,:);
SFS_ytrain = species(1:130);
SFS_xAlwaysIn = meas(131:end,:);
SFS_yAlwaysIn = species(131:end);
%% Add fake data
SFS_xtrain = [1:size(SFS_xtrain, 2); SFS_xtrain];
SFS_ytrain = ["nonsense"; SFS_ytrain];
%% Your code (for now without setting "nfeatures" and "options")
k = 5;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
% opts = statset("UseParallel",true);
fun = @(XTrain,yTrain,XTest,yTest) callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c);
%% A helper function
function err = callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn)
if sum(yTrain == "nonsense") == 1
    idx = yTrain == "nonsense";
    columns = XTrain(idx, :);
    XTrain(idx,:) = [];
    yTrain(idx) = [];
elseif sum(yTest == "nonsense") == 1
    idx = yTest == "nonsense";
    columns = XTest(idx, :);
    XTest(idx,:) = [];
    yTest(idx) = [];
else
    error("Something unexpected happened. Revisit the approach...")
end
XTrain = [XTrain; SFS_xAlwaysIn(:, columns)];
yTrain = [yTrain; SFS_yAlwaysIn];
err = errorFun(XTrain,yTrain,XTest,yTest);
end
%% Your function
function error = errorFun(XTrain,yTrain,XTest,yTest)
% Create the model with the learning method of your choice
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end

I hope you'll find this to be helpful.

Best wishes,

Harald

2 Comments
Show NoneHide None

Tobias Rieker on 4 Apr 2024 at 10:05

Edited: Tobias Rieker on 4 Apr 2024 at 10:13

Open in MATLAB Online

Thanks to your above meantioned idea I have now figured it out. Thank you!

This is my approach:

% add fake data ontop of columns 
num_col = 1:numel(EMG_chanels_remaining);  %EMG_chanels_remaining = number of features
SFS_xtrain = [num_col;SFS_xtrain];
SFS_ytrain = [categorical(1);SFS_ytrain];
fun = @(XTrain,yTrain,XTest,yTest) errorFun(XTrain,yTrain, XTest, SFS_xAlwaysIn, yTest,SFS_yAlwaysIn); %%%% FOR CV
% CVpartition Object
k = 10;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c,"nfeatures",nfeatures,"options",opts);

function error = errorFun(XTrain,yTrain, XTest, SFS_xAlwaysIn, yTest,SFS_yAlwaysIn)
%Find where fake data is & extract columns of first row = the features
%included in SFS
if ismember(yTrain(1,:),categorical(1:256))
    Ch_count = XTrain(1,:);
    XTrain(1,:) = [];
    yTrain(1,:) = [];
else
    Ch_count = XTest(1,:);
    XTest(1,:) = [];
    yTest(1,:) = [];
end
%add data only to be tested with according columns(features)
XTrain = [XTrain;SFS_xAlwaysIn(:,Ch_count)];
yTrain = [yTrain;SFS_yAlwaysIn];
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end

Harald on 4 Apr 2024 at 11:07

Glad it's working for you! If you found the answer to be helpful, please consider "accept"-ing it.

Best wishes,

Harald

Sign in to comment.

Monte Carlo repetitions with customized partitions

9 Comments
Show 7 older commentsHide 7 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Monte Carlo repetitions with customized partitions

9 Comments Show 7 older commentsHide 7 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

9 Comments
Show 7 older commentsHide 7 older comments

2 Comments
Show NoneHide None