MATLAB Examples

# Optimize an SVM Classifier Fit Using Bayesian Optimization

This example shows how to optimize an SVM classification using the fitcsvm function and OptimizeHyperparameters name-value pair. The classification works on locations of points from a Gaussian mixture model. In The Elements of Statistical Learning, Hastie, Tibshirani, and Friedman (2009), page 17 describes the model. The model begins with generating 10 base points for a "green" class, distributed as 2-D independent normals with mean (1,0) and unit variance. It also generates 10 base points for a "red" class, distributed as 2-D independent normals with mean (0,1) and unit variance. For each class (green and red), generate 100 random points as follows:

1. Choose a base point m of the appropriate color uniformly at random.
2. Generate an independent random point with 2-D normal distribution with mean m and variance I/5, where I is the 2-by-2 identity matrix. In this example, use a variance I/50 to show the advantage of optimization more clearly.

## Generate the Points and Classifier

Generate the 10 base points for each class.

```rng default % For reproducibility grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10); ```

View the base points.

```plot(grnpop(:,1),grnpop(:,2),'go') hold on plot(redpop(:,1),redpop(:,2),'ro') hold off ```

Since some red base points are close to green base points, it can be difficult to classify the data points based on location alone.

Generate the 100 data points of each class.

```redpts = zeros(100,2);grnpts = redpts; for i = 1:100 grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end ```

View the data points.

```figure plot(grnpts(:,1),grnpts(:,2),'go') hold on plot(redpts(:,1),redpts(:,2),'ro') hold off ```

## Prepare Data For Classification

Put the data into one matrix, and make a vector grp that labels the class of each point.

```cdata = [grnpts;redpts]; grp = ones(200,1); % Green label 1, red label -1 grp(101:200) = -1; ```

## Prepare Cross-Validation

Set up a partition for cross-validation. This step fixes the train and test sets that the optimization uses at each step.

```c = cvpartition(200,'KFold',10); ```

## Optimize the Fit

To find a good fit, meaning one with a low cross-validation loss, set options to use Bayesian optimization. Use the same cross-validation partition c in all optimizations.

For reproducibility, use the 'expected-improvement-plus' acquisition function.

```opts = struct('Optimizer','bayesopt','ShowPlots',true,'CVPartition',c,... 'AcquisitionFunctionName','expected-improvement-plus'); svmmod = fitcsvm(cdata,grp,'KernelFunction','rbf',... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',opts) ```
```|=====================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstrain-| KernelScale | | | result | | runtime | (observed) | (estim.) | t | | |=====================================================================================================| | 1 | Best | 0.345 | 18.712 | 0.345 | 0.345 | 0.00474 | 306.44 | | 2 | Best | 0.115 | 1.7726 | 0.115 | 0.12678 | 430.31 | 1.4864 | | 3 | Accept | 0.52 | 0.63852 | 0.115 | 0.1152 | 0.028415 | 0.014369 | | 4 | Accept | 0.61 | 0.62074 | 0.115 | 0.11504 | 133.94 | 0.0031427 | | 5 | Accept | 0.34 | 0.92943 | 0.115 | 0.11504 | 0.010993 | 5.7742 | | 6 | Best | 0.085 | 0.62699 | 0.085 | 0.085039 | 885.63 | 0.68403 | | 7 | Accept | 0.105 | 0.33127 | 0.085 | 0.085428 | 0.3057 | 0.58118 | | 8 | Accept | 0.21 | 0.26263 | 0.085 | 0.09566 | 0.16044 | 0.91824 | | 9 | Accept | 0.085 | 0.45576 | 0.085 | 0.08725 | 972.19 | 0.46259 | | 10 | Accept | 0.1 | 0.46419 | 0.085 | 0.090952 | 990.29 | 0.491 | | 11 | Best | 0.08 | 0.20476 | 0.08 | 0.079362 | 2.5195 | 0.291 | | 12 | Accept | 0.09 | 0.20536 | 0.08 | 0.08402 | 14.338 | 0.44386 | | 13 | Accept | 0.1 | 0.23069 | 0.08 | 0.08508 | 0.0022577 | 0.23803 | | 14 | Accept | 0.11 | 0.25631 | 0.08 | 0.087378 | 0.2115 | 0.32109 | | 15 | Best | 0.07 | 0.27555 | 0.07 | 0.081507 | 910.2 | 0.25218 | | 16 | Best | 0.065 | 0.3712 | 0.065 | 0.072457 | 953.22 | 0.26253 | | 17 | Accept | 0.075 | 0.46829 | 0.065 | 0.072554 | 998.74 | 0.23087 | | 18 | Accept | 0.295 | 0.36266 | 0.065 | 0.072647 | 996.18 | 44.626 | | 19 | Accept | 0.07 | 0.6039 | 0.065 | 0.06946 | 985.37 | 0.27389 | | 20 | Accept | 0.165 | 0.40697 | 0.065 | 0.071622 | 0.065103 | 0.13679 | |=====================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstrain-| KernelScale | | | result | | runtime | (observed) | (estim.) | t | | |=====================================================================================================| | 21 | Accept | 0.345 | 0.39401 | 0.065 | 0.071764 | 971.7 | 999.01 | | 22 | Accept | 0.61 | 0.33825 | 0.065 | 0.071967 | 0.0010168 | 0.0010005 | | 23 | Accept | 0.345 | 0.20604 | 0.065 | 0.071959 | 0.0010674 | 999.18 | | 24 | Accept | 0.35 | 0.24186 | 0.065 | 0.071863 | 0.0010003 | 40.628 | | 25 | Accept | 0.24 | 0.53136 | 0.065 | 0.072124 | 996.55 | 10.423 | | 26 | Accept | 0.61 | 0.33235 | 0.065 | 0.072068 | 958.64 | 0.0010026 | | 27 | Accept | 0.47 | 0.28429 | 0.065 | 0.07218 | 993.69 | 0.029723 | | 28 | Accept | 0.3 | 0.25296 | 0.065 | 0.072291 | 993.15 | 170.01 | | 29 | Accept | 0.16 | 0.56922 | 0.065 | 0.072104 | 992.81 | 3.8594 | | 30 | Accept | 0.365 | 0.18696 | 0.065 | 0.072112 | 0.0010017 | 0.044287 | __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 139.9518 seconds. Total objective function evaluation time: 31.537 Best observed feasible point: BoxConstraint KernelScale _____________ ___________ 953.22 0.26253 Observed objective function value = 0.065 Estimated objective function value = 0.072112 Function evaluation time = 0.3712 Best estimated feasible point (according to models): BoxConstraint KernelScale _____________ ___________ 985.37 0.27389 Estimated objective function value = 0.072112 Estimated function evaluation time = 0.40876 svmmod = ClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [-1 1] ScoreTransform: 'none' NumObservations: 200 HyperparameterOptimizationResults: [1x1 BayesianOptimization] Alpha: [77x1 double] Bias: -0.2352 KernelParameters: [1x1 struct] BoxConstraints: [200x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [200x1 logical] Solver: 'SMO' ```

Find the loss of the optimized model.

```lossnew = kfoldLoss(fitcsvm(cdata,grp,'CVPartition',c,'KernelFunction','rbf',... 'BoxConstraint',svmmod.HyperparameterOptimizationResults.XAtMinObjective.BoxConstraint,... 'KernelScale',svmmod.HyperparameterOptimizationResults.XAtMinObjective.KernelScale)) ```
```lossnew = 0.0650 ```

This loss is the same as the loss reported in the optimization output under "Observed objective function value".

Visualize the optimized classifier.

```d = 0.02; [x1Grid,x2Grid] = meshgrid(min(cdata(:,1)):d:max(cdata(:,1)),... min(cdata(:,2)):d:max(cdata(:,2))); xGrid = [x1Grid(:),x2Grid(:)]; [~,scores] = predict(svmmod,xGrid); figure; h = nan(3,1); % Preallocation h(1:2) = gscatter(cdata(:,1),cdata(:,2),grp,'rg','+*'); hold on h(3) = plot(cdata(svmmod.IsSupportVector,1),... cdata(svmmod.IsSupportVector,2),'ko'); contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k'); legend(h,{'-1','+1','Support Vectors'},'Location','Southeast'); axis equal hold off ```