# svmtrain

Train support vector machine classifier

`svmtrain` will be removed in a future release. See `fitcsvm`, `ClassificationSVM`, and `CompactClassificationSVM` instead.

## Syntax

`SVMStruct = svmtrain(Training,Group)SVMStruct = svmtrain(Training,Group,Name,Value)`

## Description

`SVMStruct = svmtrain(Training,Group)` returns a structure, `SVMStruct`, containing information about the trained support vector machine (SVM) classifier.

`SVMStruct = svmtrain(Training,Group,Name,Value)` returns a structure with additional options specified by one or more `Name,Value` pair arguments.

## Input Arguments

 `Training` Matrix of training data, where each row corresponds to an observation or replicate, and each column corresponds to a feature or variable. `svmtrain` treats `NaN`s or empty strings in `Training` as missing values and ignores the corresponding rows of `Group`. `Group` Grouping variable, which can be a categorical, numeric, or logical vector, a cell vector of strings, or a character matrix with each row representing a class label. Each element of `Group` specifies the group of the corresponding row of `Training`. `Group` should divide `Training` into two groups. `Group` has the same number of elements as there are rows in `Training`. `svmtrain` treats each `NaN`, empty string, or `'undefined'` in `Group` as a missing value, and ignores the corresponding row of `Training`.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

`'autoscale'`

Boolean specifying whether `svmtrain` automatically centers the data points at their mean, and scales them to have unit standard deviation, before training.

Default: `true`

`'boxconstraint'`

Value of the box constraint `C` for the soft margin. `C` can be a scalar, or a vector of the same length as the training data.

If `C` is a scalar, it is automatically rescaled by `N/(2*N1)` for the data points of group one and by `N/(2*N2)` for the data points of group two, where `N1` is the number of elements in group one, `N2` is the number of elements in group two, and `N = N1 + N2`. This rescaling is done to take into account unbalanced groups, that is cases where `N1` and `N2` have very different values.

If `C` is an array, then each array element is taken as a box constraint for the data point with the same index.

Default: `1`

`'kernelcachelimit'`

Value that specifies the size of the kernel matrix cache for the SMO training method. The algorithm keeps a matrix with up to `kernelcachelimit` × `kernelcachelimit` double-precision, floating-point numbers in memory.

Default: `5000`

`'kernel_function'`

Kernel function `svmtrain` uses to map the training data into kernel space. The default kernel function is the dot product. The kernel function can be one of the following strings or a function handle:

• `'linear'` — Linear kernel, meaning dot product.

• `'quadratic'` — Quadratic kernel.

• `'polynomial'` — Polynomial kernel (default order 3). Specify another order with the `polyorder` name-value pair.

• `'rbf'` — Gaussian Radial Basis Function kernel with a default scaling factor, sigma, of 1. Specify another value for sigma with the `rbf_sigma` name-value pair.

• `'mlp'` — Multilayer Perceptron kernel with default scale `[1 –1]`. Specify another scale with the `mlp_params` name-value pair.

• `@kfun` — Function handle to a kernel function. A kernel function must be of the form

`function K = kfun(U, V)`

The returned value, `K`, is a matrix of size `M`-by-`N`, where `U` and `V` have `M` and `N` rows respectively.

If `kfun` has extra parameters, include the extra parameters via an anonymous function. For example, suppose that your kernel function is:

```function k = kfun(u,v,p1,p2) k = tanh(p1*(u*v')+p2);```

Set values for `p1` and `p2`, and then use an anonymous function:

`@(u,v) kfun(u,v,p1,p2)`

Default: `'linear'`

`'kktviolationlevel'`

Value that specifies the fraction of variables allowed to violate the Karush-Kuhn-Tucker (KKT) conditions for the SMO training method. Set any value in [0,1). For example, if you set `kktviolationlevel` to `0.05`, then 5% of the variables are allowed to violate the KKT conditions.

 Tip   Set this option to a positive value to help the algorithm converge if it is fluctuating near a good solution.

Default: `0`

`'method'`

Method used to find the separating hyperplane. Options are:

• `'QP'` — Quadratic programming (requires an Optimization Toolbox™ license). The classifier is a 2-norm soft-margin support vector machine. Give quadratic programming options with the `options` name-value pair, and create `options` with `optimset`.

• `'SMO'` — Sequential Minimal Optimization. Give SMO options with the `options` name-value pair, and create `options` with `statset`.

• `'LS'` — Least squares.

Default: `SMO`

`'mlp_params'`

Parameters of the Multilayer Perceptron (`mlp`) kernel. The `mlp` kernel requires two parameters, ```[P1 P2]```. The kernel `K = tanh(P1*U*V' + P2)`, where `P1  > 0` and `P2 < 0`.

Default: `[1 –1]`

`'options'`

Options structure for training.

• When you set `'method'` to `'SMO'` (default), create the `options` structure using `statset`. Options are:

 `Display` String that specifies the level of information about the optimization iterations that is displayed as the algorithm runs. Choices are:`off` (default) — Reports nothing.`iter` — Reports every 500 iterations.`final` — Reports only when the algorithm finishes. `MaxIter` Integer that specifies the maximum number of iterations of the main loop. If this limit is exceeded before the algorithm converges, then the algorithm stops and returns an error. Default is `15000`.

The other name-value pairs that relate specifically to the `'SMO'` method are `kernelcachelimit`, `kktviolationlevel`, and `tolkkt`.

• When you set `method` to `'QP'`, create the options structure using `optimset`. For details of applicable option choices, see `quadprog` options. SVM uses a convex quadratic program, so you can choose the `'interior-point-convex'` `quadprog` algorithm. In limited testing, the `'interior-point-convex'` algorithm was the best `quadprog` option for `svmtrain`, in both speed and memory utilization.

`'polyorder'`

Order of the polynomial kernel.

Default: `3`

`'rbf_sigma'`

Scaling factor (sigma) in the radial basis function kernel.

Default: `1`

`'showplot'`

Boolean indicating whether to plot the grouped data and separating line. Creates a plot only when the data has two columns (features).

Default: `false`

`'tolkkt'`

Value that specifies the tolerance with which the Karush-Kuhn-Tucker (KKT) conditions are checked for the SMO training method. For a definition of KKT conditions, see Karush-Kuhn-Tucker (KKT) Conditions.

Default: `1e-3`

## Output Arguments

 `SVMStruct` Structure containing information about the trained SVM classifier in the following fields: `SupportVectors` — Matrix of data points with each row corresponding to a support vector in the normalized data space. This matrix is a subset of the `Training` input data matrix, after normalization has been applied according to the `'AutoScale'` argument. `Alpha` — Vector of weights for the support vectors. The sign of the weight is positive for support vectors belonging to the first group, and negative for the second group.`Bias` — Intercept of the hyperplane that separates the two groups in the normalized data space (according to the `'AutoScale'` argument). `KernelFunction` — Handle to the function that maps the training data into kernel space.`KernelFunctionArgs` — Cell array of any additional arguments required by the kernel function.`GroupNames` — Categorical, numeric, or logical vector, a cell vector of strings, or a character matrix with each row representing a class label. Specifies the group identifiers for the support vectors. It has the same number of elements as there are rows in `SupportVectors`. Each element specifies the group to which the corresponding row in `SupportVectors` belongs.`SupportVectorIndices` — Vector of indices that specify the rows in `Training`, the training data, that were selected as support vectors after the data was normalized, according to the `AutoScale` argument.`ScaleData` — Field containing normalization factors. When `'AutoScale'` is set to `false`, it is empty. When `AutoScale` is set to `true`, it is a structure containing two fields:`shift` — Row vector of values. Each value is the negative of the mean across an observation in `Training`, the training data.`scaleFactor` — Row vector of values. Each value is `1` divided by the standard deviation of an observation in `Training`, the training data.Both `svmtrain` and `svmclassify` apply the scaling in `ScaleData`.`FigureHandles` — Vector of figure handles created by `svmtrain` when using the `'Showplot'` argument.

## Examples

collapse all

### Train an SVM Classifier

Find a line separating the Fisher iris data on versicolor and virginica species, according to the petal length and petal width measurements. These two species are in rows 51 and higher of the data set, and the petal length and width are the third and fourth columns.

```load fisheriris xdata = meas(51:end,3:4); group = species(51:end); svmStruct = svmtrain(xdata,group,'ShowPlot',true); ```

collapse all

### Karush-Kuhn-Tucker (KKT) Conditions

The Karush-Kuhn-Tucker (KKT) conditions are analogous to the condition that the gradient must be zero at a minimum, modified to take constraints into account. The difference is that the KKT conditions hold for constrained problems. The KKT conditions use the auxiliary Lagrangian function:

$L\left(x,\lambda \right)=f\left(x\right)+\sum {\lambda }_{g,i}{g}_{i}\left(x\right)+\sum {\lambda }_{h,i}{h}_{i}\left(x\right).$

Here f(x) is the objective function, g(x) is a vector of constraint functions g(x) ≤ 0, and h(x) is a vector of constraint functions h(x) = 0. The vector λ, which is the concatenation of λg and λh, is the Lagrange multiplier vector. Its length is the total number of constraints.

The KKT conditions are:

### Algorithms

The `svmtrain` function uses an optimization method to identify support vectors si, weights αi, and bias b that are used to classify vectors x according to the following equation:

$c=\sum _{i}{\alpha }_{i}k\left({s}_{i},x\right)+b,$

where k is a kernel function. In the case of a linear kernel, k is the dot product. If c ≥ 0, then x is classified as a member of the first group, otherwise it is classified as a member of the second group.

### Memory Usage and Out of Memory Error

When you set `'Method'` to `'QP'`, the `svmtrain` function operates on a data set containing `N` elements, and it creates an `(N+1)`-by-`(N+1)` matrix to find the separating hyperplane. This matrix needs at least `8*(n+1)^2` bytes of contiguous memory. If this size of contiguous memory is not available, the software displays an "out of memory" error message.

When you set `'Method'` to `'SMO'` (default), memory consumption is controlled by the `kernelcachelimit` option. The SMO algorithm stores only a submatrix of the kernel matrix, limited by the size specified by the `kernelcachelimit` option. However, if the number of data points exceeds the size specified by the `kernelcachelimit` option, the SMO algorithm slows down because it has to recalculate the kernel matrix elements.

When using `svmtrain` on large data sets, and you run out of memory or the optimization step is very time consuming, try either of the following:

• Use a smaller number of samples and use cross-validation to test the performance of the classifier.

• Set `'Method'` to `'SMO'`, and set the `kernelcachelimit` option as large as your system permits.

## References

[1] Kecman, V., Learning and Soft Computing, MIT Press, Cambridge, MA. 2001.

[2] Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., and Vandewalle, J., Least Squares Support Vector Machines, World Scientific, Singapore, 2002.

[3] Scholkopf, B., and Smola, A.J., Learning with Kernels, MIT Press, Cambridge, MA. 2002.

[4] Cristianini, N., and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, First Edition (Cambridge: Cambridge University Press). `http://www.support-vector.net/`