MATLAB Answers


Why are fitcsvm Hyperparameters trained on the whole dataset and used for crossvalidation?

Asked by Patrick Schlegel on 23 Apr 2019
Latest activity Commented on by Patrick Schlegel on 25 Apr 2019
hi everyone.
I'm currently working with SVMs for data separation and I noticed something conspicouos in a matlab example. The example code goes as follows:
%data generation and plotting
% rng default % For reproducibility
grnpop = mvnrnd([1,0],eye(2),10);
redpop = mvnrnd([0,1],eye(2),10);
hold on
hold off
redpts = zeros(100,2);grnpts = redpts;
for i = 1:100
grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02);
redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02);
hold on
hold off
cdata = [grnpts;redpts];
grp = ones(200,1);
% Green label 1, red label -1
grp(101:200) = -1;
%here starts the interesting part
%Set up a partition for cross-validation.
c = cvpartition(200,'KFold',10);
%optimize svm parameters using cdata and group, i.e. all data we have
opts = struct('Optimizer','bayesopt','ShowPlots',true,'CVPartition',c,...
svmmod = fitcsvm(cdata,grp,'KernelFunction','rbf',...
%calculate the loss using the partitions but also the svm.HyperparameterOptimizationResults that were optimized using the whole dataset!
lossnew = kfoldLoss(fitcsvm(cdata,grp,'CVPartition',c,'KernelFunction','rbf',...
The whole example can also be found here:
As it is already apparent from my comment lines in the code example, the problem I have here is the following:
The Hyperparameters were optimized using all data we have, this means that these Hyperparameters already "have seen" the test partitions from the cross validation model and hence adapted to it in the optimization process. So this cross validation does not validate on test data that is entirely new to the trained SVM-Model and hence the cross-validation-error should be artificially low in some cases. I have also made some experiments which seem to confirm this.
My question now is, if I misunderstood something and if not, why are the hyperparameters trained this way?


Sign in to comment.

1 Answer

Answer by Alan Weiss
on 23 Apr 2019
 Accepted Answer

Perhaps I didn't explain well what the example is supposed to be showing. The second "fitting" step that you object to is not fitting anything at all, as you noticed. It is just the way I thought of to calculate the cross-valiidation loss using the hyperparameters that were already found. In the example I point out that the objective function that is returned in the first fitting step is exactly the same as lossnew, and that is the point that I was trying to make; you would never run the second "fit" in your own work because it is entirely redundant.
Sorry that I confused you.
Alan Weiss
MATLAB mathematical toolbox documentation


OK, but if I build a model using
opts = struct('Optimizer','bayesopt','ShowPlots',true,'CVPartition',c,...
svmmod = fitcsvm(cdata,grp,'KernelFunction','rbf',...
methods like e.g. kfoldLoss can not be applied to this model. Could it be, that the CVPartition specified in opts is somehow not correctly transferred to the model building fitcsvm function?
Please carefully read the description of the Mdl argument that fitcsvm returns. The returned svmmod is a ClassificationSVM object, not a ClassificationPartitionedModel, even though it was optimized using a cross-validation procedure, because the arguments to fitcsvm do not include an explicit cross-validation name. If you want to get the partition back, well, you have to jump through some hoops, like I did in the example.
Alan Weiss
MATLAB mathematical toolbox documentation
So the exact type of Mdl changes dependig on the input, but the upper lines of code already produce a fully trained cross validated model. This answers my question
Thank you for your help

Sign in to comment.