# templateSVM

Support vector machine template

## Syntax

``t = templateSVM()``
``t = templateSVM(Name,Value)``

## Description

example

````t = templateSVM()` returns a support vector machine (SVM) learner template suitable for training error-correcting output code (ECOC) multiclass models.If you specify a default template, then the software uses default values for all input arguments during training.Specify `t` as a binary learner, or one in a set of binary learners, in `fitcecoc` to train an ECOC multiclass classifier.```

example

````t = templateSVM(Name,Value)` returns a template with additional options specified by one or more name-value pair arguments.For example, you can specify the box constraint, the kernel function, or whether to standardize the predictors.If you display `t` in the Command Window, then all options appear empty (`[]`), except those that you specify using name-value pair arguments. During training, the software uses default values for empty options.```

## Examples

collapse all

Use `templateSVM` to specify a default SVM template.

`t = templateSVM()`
```t = Fit template for classification SVM. Alpha: [0x1 double] BoxConstraint: [] CacheSize: [] CachingMethod: '' ClipAlphas: [] DeltaGradientTolerance: [] Epsilon: [] GapTolerance: [] KKTTolerance: [] IterationLimit: [] KernelFunction: '' KernelScale: [] KernelOffset: [] KernelPolynomialOrder: [] NumPrint: [] Nu: [] OutlierFraction: [] RemoveDuplicates: [] ShrinkagePeriod: [] Solver: '' StandardizeData: [] SaveSupportVectors: [] VerbosityLevel: [] Version: 2 Method: 'SVM' Type: 'classification' ```

All properties of the template object are empty except for `Method` and `Type`. When you pass `t` to the training function, the software fills in the empty properties with their respective default values. For example, the software fills the `KernelFunction` property with `'linear'`. For details on other default values, see `fitcsvm`.

`t` is a plan for an SVM learner, and no computation occurs when you specify it. You can pass `t` to `fitcecoc` to specify SVM binary learners for ECOC multiclass learning. However, by default, `fitcecoc` uses default SVM binary learners.

Create a nondefault SVM template for use in `fitcecoc`.

`load fisheriris`

Create a template for SVM binary classifiers, and specify to use a Gaussian kernel function.

`t = templateSVM('KernelFunction','gaussian')`
```t = Fit template for classification SVM. Alpha: [0x1 double] BoxConstraint: [] CacheSize: [] CachingMethod: '' ClipAlphas: [] DeltaGradientTolerance: [] Epsilon: [] GapTolerance: [] KKTTolerance: [] IterationLimit: [] KernelFunction: 'gaussian' KernelScale: [] KernelOffset: [] KernelPolynomialOrder: [] NumPrint: [] Nu: [] OutlierFraction: [] RemoveDuplicates: [] ShrinkagePeriod: [] Solver: '' StandardizeData: [] SaveSupportVectors: [] VerbosityLevel: [] Version: 2 Method: 'SVM' Type: 'classification' ```

All properties of the template object are empty except for `DistributionNames`, `Method`, and `Type`. When trained on, the software fills in the empty properties with their respective default values.

Specify `t` as a binary learner for an ECOC multiclass model.

`Mdl = fitcecoc(meas,species,'Learners',t);`

`Mdl` is a `ClassificationECOC` multiclass classifier. By default, the software trains `Mdl` using the one-versus-one coding design.

Display the in-sample (resubstitution) misclassification error.

`L = resubLoss(Mdl,'LossFun','classiferror')`
```L = 0.0200 ```

When you train an ECOC model with linear SVM binary learners, `fitcecoc` empties the `Alpha`, `SupportVectorLabels`, and `SupportVectors` properties of the binary learners by default. You can choose instead to retain the support vectors and related values, and then discard them from the model later.

```load fisheriris rng(1); % For reproducibility```

Train an ECOC model using the entire data set. Specify retaining the support vectors by passing in the appropriate SVM template.

```t = templateSVM('SaveSupportVectors',true); MdlSV = fitcecoc(meas,species,'Learners',t);```

`MdlSV` is a trained `ClassificationECOC` model with linear SVM binary learners. By default, `fitcecoc` implements a one-versus-one coding design, which requires three binary learners for three-class learning.

Access the estimated $\alpha$ (alpha) values using dot notation.

```alpha = cell(3,1); alpha{1} = MdlSV.BinaryLearners{1}.Alpha; alpha{2} = MdlSV.BinaryLearners{2}.Alpha; alpha{3} = MdlSV.BinaryLearners{3}.Alpha; alpha```
```alpha=3×1 cell array { 3x1 double} { 3x1 double} {23x1 double} ```

`alpha` is a 3-by-1 cell array that stores the estimated values of $\alpha$.

Discard the support vectors and related values from the ECOC model.

`Mdl = discardSupportVectors(MdlSV);`

`Mdl` is similar to `MdlSV`, except that the `Alpha`, `SupportVectorLabels`, and `SupportVectors` properties of all the linear SVM binary learners are empty (`[]`).

```areAllEmpty = @(x)isempty([x.Alpha x.SupportVectors x.SupportVectorLabels]); cellfun(areAllEmpty,Mdl.BinaryLearners)```
```ans = 3x1 logical array 1 1 1 ```

Compare the sizes of the two ECOC models.

```vars = whos('Mdl','MdlSV'); 100*(1 - vars(1).bytes/vars(2).bytes)```
```ans = 4.7075 ```

`Mdl` is about 5% smaller than `MdlSV`.

Reduce your memory usage by compacting `Mdl` and then clearing `Mdl` and `MdlSV` from the workspace.

```CompactMdl = compact(Mdl); clear Mdl MdlSV;```

Predict the label for a random row of the training data using the more efficient SVM model.

`idx = randsample(size(meas,1),1)`
```idx = 63 ```
`predictedLabel = predict(CompactMdl,meas(idx,:))`
```predictedLabel = 1x1 cell array {'versicolor'} ```
`trueLabel = species(idx)`
```trueLabel = 1x1 cell array {'versicolor'} ```

## Input Arguments

collapse all

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'BoxConstraint',0.1,'KernelFunction','gaussian','Standardize',1` specifies a box constraint of `0.1`, to use the Gaussian (RBF) kernel, and to standardize the predictors.

Box constraint, specified as the comma-separated pair consisting of `'BoxConstraint'` and a positive scalar.

For one-class learning, the software always sets the box constraint to `1`.

For more details on the relationships and algorithmic behavior of `BoxConstraint`, `Cost`, `Prior`, `Standardize`, and `Weights`, see Algorithms.

Example: `'BoxConstraint',100`

Data Types: `double` | `single`

Cache size, specified as the comma-separated pair consisting of `'CacheSize'` and `'maximal'` or a positive scalar.

If `CacheSize` is `'maximal'`, then the software reserves enough memory to hold the entire n-by-n Gram matrix.

If `CacheSize` is a positive scalar, then the software reserves `CacheSize` megabytes of memory for training the model.

Example: `'CacheSize','maximal'`

Data Types: `double` | `single` | `char` | `string`

Flag to clip alpha coefficients, specified as the comma-separated pair consisting of `'ClipAlphas'` and either `true` or `false`.

Suppose that the alpha coefficient for observation j is αj and the box constraint of observation j is Cj, j = 1,...,n, where n is the training sample size.

ValueDescription
`true`At each iteration, if αj is near 0 or near Cj, then MATLAB® sets αj to 0 or to Cj, respectively.
`false`MATLAB does not change the alpha coefficients during optimization.

MATLAB stores the final values of α in the `Alpha` property of the trained SVM model object.

`ClipAlphas` can affect SMO and ISDA convergence.

Example: `'ClipAlphas',false`

Data Types: `logical`

Tolerance for the gradient difference between upper and lower violators obtained by Sequential Minimal Optimization (SMO) or Iterative Single Data Algorithm (ISDA), specified as the comma-separated pair consisting of `'DeltaGradientTolerance'` and a nonnegative scalar.

If `DeltaGradientTolerance` is `0`, then the software does not use the tolerance for the gradient difference to check for optimization convergence.

The default values are:

• `1e-3` if the solver is SMO (for example, you set `'Solver','SMO'`)

• `0` if the solver is ISDA (for example, you set `'Solver','ISDA'`)

Example: `'DeltaGradientTolerance',1e-2`

Data Types: `double` | `single`

Feasibility gap tolerance obtained by SMO or ISDA, specified as the comma-separated pair consisting of `'GapTolerance'` and a nonnegative scalar.

If `GapTolerance` is `0`, then the software does not use the feasibility gap tolerance to check for optimization convergence.

Example: `'GapTolerance',1e-2`

Data Types: `double` | `single`

Maximal number of numerical optimization iterations, specified as the comma-separated pair consisting of `'IterationLimit'` and a positive integer.

The software returns a trained model regardless of whether the optimization routine successfully converges. `Mdl.ConvergenceInfo` contains convergence information.

Example: `'IterationLimit',1e8`

Data Types: `double` | `single`

Kernel function used to compute the elements of the Gram matrix, specified as the comma-separated pair consisting of `'KernelFunction'` and a kernel function name. Suppose G(xj,xk) is element (j,k) of the Gram matrix, where xj and xk are p-dimensional vectors representing observations j and k in `X`. This table describes supported kernel function names and their functional forms.

Kernel Function NameDescriptionFormula
`'gaussian'` or `'rbf'`Gaussian or Radial Basis Function (RBF) kernel, default for one-class learning

`$G\left({x}_{j},{x}_{k}\right)=\mathrm{exp}\left(-{‖{x}_{j}-{x}_{k}‖}^{2}\right)$`

`'linear'`Linear kernel, default for two-class learning

`$G\left({x}_{j},{x}_{k}\right)={x}_{j}\prime {x}_{k}$`

`'polynomial'`Polynomial kernel. Use `'PolynomialOrder',q` to specify a polynomial kernel of order `q`.

`$G\left({x}_{j},{x}_{k}\right)={\left(1+{x}_{j}\prime {x}_{k}\right)}^{q}$`

You can set your own kernel function, for example, `kernel`, by setting `'KernelFunction','kernel'`. The value `kernel` must have this form.

`function G = kernel(U,V)`
where:

• `U` is an m-by-p matrix. Columns correspond to predictor variables, and rows correspond to observations.

• `V` is an n-by-p matrix. Columns correspond to predictor variables, and rows correspond to observations.

• `G` is an m-by-n Gram matrix of the rows of `U` and `V`.

`kernel.m` must be on the MATLAB path.

It is a good practice to avoid using generic names for kernel functions. For example, call a sigmoid kernel function `'mysigmoid'` rather than `'sigmoid'`.

Example: `'KernelFunction','gaussian'`

Data Types: `char` | `string`

Kernel offset parameter, specified as the comma-separated pair consisting of `'KernelOffset'` and a nonnegative scalar.

The software adds `KernelOffset` to each element of the Gram matrix.

The defaults are:

• `0` if the solver is SMO (that is, you set `'Solver','SMO'`)

• `0.1` if the solver is ISDA (that is, you set `'Solver','ISDA'`)

Example: `'KernelOffset',0`

Data Types: `double` | `single`

Kernel scale parameter, specified as the comma-separated pair consisting of `'KernelScale'` and `'auto'` or a positive scalar. The software divides all elements of the predictor matrix `X` by the value of `KernelScale`. Then, the software applies the appropriate kernel norm to compute the Gram matrix.

• If you specify `'auto'`, then the software selects an appropriate scale factor using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. Therefore, to reproduce results, set a random number seed using `rng` before training.

• If you specify `KernelScale` and your own kernel function, for example, `'KernelFunction','kernel'`, then the software throws an error. You must apply scaling within `kernel`.

Example: `'KernelScale','auto'`

Data Types: `double` | `single` | `char` | `string`

Karush-Kuhn-Tucker (KKT) complementarity conditions violation tolerance, specified as the comma-separated pair consisting of `'KKTTolerance'` and a nonnegative scalar.

If `KKTTolerance` is `0`, then the software does not use the KKT complementarity conditions violation tolerance to check for optimization convergence.

The default values are:

• `0` if the solver is SMO (for example, you set `'Solver','SMO'`)

• `1e-3` if the solver is ISDA (for example, you set `'Solver','ISDA'`)

Example: `'KKTTolerance',1e-2`

Data Types: `double` | `single`

Number of iterations between optimization diagnostic message output, specified as the comma-separated pair consisting of `'NumPrint'` and a nonnegative integer.

If you specify `'Verbose',1` and `'NumPrint',numprint`, then the software displays all optimization diagnostic messages from SMO and ISDA every `numprint` iterations in the Command Window.

Example: `'NumPrint',500`

Data Types: `double` | `single`

Expected proportion of outliers in the training data, specified as the comma-separated pair consisting of `'OutlierFraction'` and a numeric scalar in the interval [0,1).

Suppose that you set `'OutlierFraction',outlierfraction`, where `outlierfraction` is a value greater than 0.

• For two-class learning, the software implements robust learning. In other words, the software attempts to remove 100*`outlierfraction`% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

• For one-class learning, the software finds an appropriate bias term such that `outlierfraction` of the observations in the training set have negative scores.

Example: `'OutlierFraction',0.01`

Data Types: `double` | `single`

Polynomial kernel function order, specified as the comma-separated pair consisting of `'PolynomialOrder'` and a positive integer.

If you set `'PolynomialOrder'` and `KernelFunction` is not `'polynomial'`, then the software throws an error.

Example: `'PolynomialOrder',2`

Data Types: `double` | `single`

Store support vectors, their labels, and the estimated α coefficients as properties of the resulting model, specified as the comma-separated pair consisting of `'SaveSupportVectors'` and `true` or `false`.

If `SaveSupportVectors` is `true`, the resulting model stores the support vectors in the `SupportVectors` property, their labels in the `SupportVectorLabels` property, and the estimated α coefficients in the `Alpha` property of the compact, SVM learners.

If `SaveSupportVectors` is `false` and `KernelFunction` is `'linear'`, the resulting model does not store the support vectors and the related estimates.

To reduce memory consumption by compact SVM models, specify `SaveSupportVectors`.

For linear, SVM binary learners in an ECOC model, the default value is `false`. Otherwise, the default value is `true`.

Example: `'SaveSupportVectors',true`

Data Types: `logical`

Number of iterations between reductions of the active set, specified as the comma-separated pair consisting of `'ShrinkagePeriod'` and a nonnegative integer.

If you set `'ShrinkagePeriod',0`, then the software does not shrink the active set.

Example: `'ShrinkagePeriod',1000`

Data Types: `double` | `single`

Optimization routine, specified as the comma-separated pair consisting of `'Solver'` and a value in this table.

ValueDescription
`'ISDA'`Iterative Single Data Algorithm (see [30])
`'L1QP'`Uses `quadprog` (Optimization Toolbox) to implement L1 soft-margin minimization by quadratic programming. This option requires an Optimization Toolbox™ license. For more details, see Quadratic Programming Definition (Optimization Toolbox).
`'SMO'`Sequential Minimal Optimization (see [17])

The default value is `'ISDA'` if you set `'OutlierFraction'` to a positive value for two-class learning, and `'SMO'` otherwise.

Example: `'Solver','ISDA'`

Flag to standardize the predictor data, specified as the comma-separated pair consisting of `'Standardize'` and `true` (`1`) or `false` `(0)`.

If you set `'Standardize',true`:

• The software centers and scales each column of the predictor data (`X`) by the weighted column mean and standard deviation, respectively (for details on weighted standardizing, see Algorithms). MATLAB does not standardize the data contained in the dummy variable columns generated for categorical predictors.

• The software trains the classifier using the standardized predictor matrix, but stores the unstandardized data in the classifier property `X`.

Example: `'Standardize',true`

Data Types: `logical`

Verbosity level, specified as the comma-separated pair consisting of `'Verbose'` and `0`, `1`, or `2`. The value of `Verbose` controls the amount of optimization information that the software displays in the Command Window and saves the information as a structure to `Mdl.ConvergenceInfo.History`.

This table summarizes the available verbosity level options.

ValueDescription
`0`The software does not display or save convergence information.
`1`The software displays diagnostic messages and saves convergence criteria every `numprint` iterations, where `numprint` is the value of the name-value pair argument `'NumPrint'`.
`2`The software displays diagnostic messages and saves convergence criteria at every iteration.

Example: `'Verbose',1`

Data Types: `double` | `single`

## Output Arguments

collapse all

SVM classification template suitable for training error-correcting output code (ECOC) multiclass models, returned as a template object. Pass `t` to `fitcecoc` to specify how to create the SVM classifier for the ECOC model.

If you display `t` to the Command Window, then all, unspecified options appear empty (`[]`). However, the software replaces empty options with their corresponding default values during training.

## Tips

By default and for efficiency, `fitcecoc` empties the `Alpha`, `SupportVectorLabels`, and `SupportVectors` properties for all linear SVM binary learners. `fitcecoc` lists `Beta`, rather than `Alpha`, in the model display.

To store `Alpha`, `SupportVectorLabels`, and `SupportVectors`, pass a linear SVM template that specifies storing support vectors to `fitcecoc`. For example, enter:

```t = templateSVM('SaveSupportVectors',true) Mdl = fitcecoc(X,Y,'Learners',t);```

You can remove the support vectors and related values by passing the resulting `ClassificationECOC` model to `discardSupportVectors`.

## References

[1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.” Journal of Machine Learning Research, Vol 6, 2005, pp. 1889–1918.

[3] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[4] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” In Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.

[5] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.

[6] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.

## Version History

Introduced in R2014b