Understanding MatLab's built-in SVM cross-validation on fitcsvm

7 views (last 30 days)
I have a dataset of 53 trials and I want to do leave-one-out cross-validation of a binary classifier. I tried to explicitly do the cross-validation of an SVM, with this code:
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'BoxConstraint', 0.046125, 'ClassNames', class_names};
SVMModel = cell(53,1);
for i_trial = 1:53
%% Train
train_set_indices = [1:i_trial-1 i_trial+1:n_trials];
SVMModel{i_trial} = fitcsvm(input_data(train_set_indices, :), ...
true_labels(train_set_indices), SVM_params{:});
%% Predict
[estimated_labels(i_trial), score] = predict(SVMModel{i_trial}, ...
input_data(i_trial, :));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
which gives me class_error equals to 0.4151.
However, if I tried MatLab's built-in SVM cross-validation
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'Leaveout', 'on', 'BoxConstraint', 0.046125, 'ClassNames', class_names};
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
CSVM.kfoldLoss would be equal to 0.3208. Why the difference? What I am doing wrong in my explicit cross-validation?
I did the same exercise with 'Standarize', off and 'KernelScale', 987.8107 (optimized hyperparameters), and the difference is more dramatic: class_error=0.4528, while CSVM.kfoldLoss=0.
Finally, I would also like to know how what was the training and validation set for each of the trained models in CSVM.Trained. I would like to call predict on each trained model with the left-out sample (trial) and compare the result with CSVM.kfoldPredict.
Update 1: I found that c.traininig and c.test return the indices of the training and test sets. However, this code
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, 'CVPartition', c,...
'BoxConstraint', BoxConstraint, 'ClassNames', class_names};
estimated_labels = cell(1,53);
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
for ii=1:53
estimated_labels(ii) = predict(CSVM.Trained{ii}, input_data(c.test(ii),:,1));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
gives me class_error=0.5849, which is different to CSVM.kfoldLoss (0.3208). Why the difference? Is this the right way to double-check the cross-validation?
Update 2: I attached the data.
Thanks!
  2 Comments
Carlos Mendoza
Carlos Mendoza on 31 Aug 2020
I didn't forget. I thought that the code would be enough. Probably an error.

Sign in to comment.

Answers (1)

Xingwang Yong
Xingwang Yong on 29 Sep 2020
Maybe kfoldLoss uses a different definition of loss than yours. Your definition is 1-accuracy.
https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedkernel.kfoldloss.html?s_tid=srchtitle
  2 Comments
Xingwang Yong
Xingwang Yong on 3 Oct 2020
class_error = error_count / n_trials;
= (n_trials - correct_count) / n_trials
= 1 - correct_count / n_trials
= 1 - accuracy
That is your definition of loss.

Sign in to comment.

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!