Feature selection for SVM classifier

5 views (last 30 days)
Jos Huigen
Jos Huigen on 25 Jun 2019
I am trying to have matlab do a feature selection for me so I can use the svm classifier on my data and check the ideal performance for each amount of features used for the classification. In my script, I have checked the differentiation between the two groups ("healthy" and "sick") through t-statistics. The t-statistics actually already show me which features would be best, since the feature with the lowest p-value would have the best discriminating properties, but I want it to be done by the sequentialfs command. The problem is, that the feature selection selects different genes than I would have chosen when looking at the p-values (my first-choice feature would be A and the feature selection selects B). Could anyone check if there is something wrong with either the t-statistics or the feature selection? I have attached the dataset matrix to this message. Any help is greatly appreciated!
load samples1
ID=samples1(:,12)
ID(ID<3)=0
ID(ID>=3)=1
samples1(:,13)=ID
%% Determining significancy of feature differentiation between sick and healthy group
sick=find(samples1(1:60,12)>=3);
healthy=find(samples1(1:60,12)<3);
sick2 = samples1(sick,:);
healthy2 = samples1(healthy,:);
[h,p,ci,stats] = ttest2(healthy2,sick2);
%% Train/Test Division
%
x_train=(samples1(1:60,2:7))
y_train=(samples1(1:60,13))
x_test=(samples1(61:end,2:7))
y_test=(samples1(61:end,13))
%% CV partition
c=cvpartition(y_train,'LeaveOut')
%% feature selection
opts = statset('display','iter');
classf = @(x_train, y_train, x_test, y_test)...
sum(predict(fitcsvm(x_train, y_train,'KernelFunction','RBF','Kernelscale','auto'), x_test)~=y_test);
[fs, history] = sequentialfs(classf, x_train, y_train, 'cv', c, 'options', opts,'nfeatures',6);
%% Best hyperparameter
X_train_w_best_feature = x_train(:,fs);
Mdl = fitcsvm(X_train_w_best_feature,y_train,'KernelFunction','rbf','OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
'expected-improvement-plus','ShowPlots',true)); % Bayes' Optimization.
%% Final test with test set
X_test_w_best_feature = x_test(:,fs);
test_accuracy_for_iter = sum((predict(Mdl,X_test_w_best_feature) == y_test))/length(y_test)*100
%% Extract error rate
label = predict(Mdl, X_test_w_best_feature)
L=loss(Mdl,X_test_w_best_feature,y_test)

Answers (0)

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!