Why are fitcsvm Hyperparameters trained on the whole dataset and used for crossvalidation?

2 views (last 30 days)
hi everyone.
I'm currently working with SVMs for data separation and I noticed something conspicouos in a matlab example. The example code goes as follows:
%data generation and plotting
% rng default % For reproducibility
grnpop = mvnrnd([1,0],eye(2),10);
redpop = mvnrnd([0,1],eye(2),10);
plot(grnpop(:,1),grnpop(:,2),'go')
hold on
plot(redpop(:,1),redpop(:,2),'ro')
hold off
redpts = zeros(100,2);grnpts = redpts;
for i = 1:100
grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02);
redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02);
end
figure
plot(grnpts(:,1),grnpts(:,2),'go')
hold on
plot(redpts(:,1),redpts(:,2),'ro')
hold off
cdata = [grnpts;redpts];
grp = ones(200,1);
% Green label 1, red label -1
grp(101:200) = -1;
%%%
%here starts the interesting part
%%%
%Set up a partition for cross-validation.
c = cvpartition(200,'KFold',10);
%optimize svm parameters using cdata and group, i.e. all data we have
opts = struct('Optimizer','bayesopt','ShowPlots',true,'CVPartition',c,...
'AcquisitionFunctionName','expected-improvement-plus');
svmmod = fitcsvm(cdata,grp,'KernelFunction','rbf',...
'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',opts)
%calculate the loss using the partitions but also the svm.HyperparameterOptimizationResults that were optimized using the whole dataset!
lossnew = kfoldLoss(fitcsvm(cdata,grp,'CVPartition',c,'KernelFunction','rbf',...
'BoxConstraint',svmmod.HyperparameterOptimizationResults.XAtMinObjective.BoxConstraint,...
'KernelScale',svmmod.HyperparameterOptimizationResults.XAtMinObjective.KernelScale))
The whole example can also be found here:
As it is already apparent from my comment lines in the code example, the problem I have here is the following:
The Hyperparameters were optimized using all data we have, this means that these Hyperparameters already "have seen" the test partitions from the cross validation model and hence adapted to it in the optimization process. So this cross validation does not validate on test data that is entirely new to the trained SVM-Model and hence the cross-validation-error should be artificially low in some cases. I have also made some experiments which seem to confirm this.
My question now is, if I misunderstood something and if not, why are the hyperparameters trained this way?

Accepted Answer

Alan Weiss
Alan Weiss on 23 Apr 2019
Perhaps I didn't explain well what the example is supposed to be showing. The second "fitting" step that you object to is not fitting anything at all, as you noticed. It is just the way I thought of to calculate the cross-valiidation loss using the hyperparameters that were already found. In the example I point out that the objective function that is returned in the first fitting step is exactly the same as lossnew, and that is the point that I was trying to make; you would never run the second "fit" in your own work because it is entirely redundant.
Sorry that I confused you.
Alan Weiss
MATLAB mathematical toolbox documentation
  3 Comments
Alan Weiss
Alan Weiss on 23 Apr 2019
Please carefully read the description of the Mdl argument that fitcsvm returns. The returned svmmod is a ClassificationSVM object, not a ClassificationPartitionedModel, even though it was optimized using a cross-validation procedure, because the arguments to fitcsvm do not include an explicit cross-validation name. If you want to get the partition back, well, you have to jump through some hoops, like I did in the example.
Alan Weiss
MATLAB mathematical toolbox documentation
Patrick Schlegel
Patrick Schlegel on 25 Apr 2019
So the exact type of Mdl changes dependig on the input, but the upper lines of code already produce a fully trained cross validated model. This answers my question
Thank you for your help

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!