Accelerating the pace of engineering and science

# svmtrain

Train support vector machine classifier

## Syntax

SVMStruct = svmtrain(Training,Group)
SVMStruct = svmtrain(Training,Group,Name,Value)

## Description

SVMStruct = svmtrain(Training,Group) returns a structure, SVMStruct, containing information about the trained support vector machine (SVM) classifier.

SVMStruct = svmtrain(Training,Group,Name,Value) returns a structure with additional options specified by one or more Name,Value pair arguments.

## Input Arguments

 Training Matrix of training data, where each row corresponds to an observation or replicate, and each column corresponds to a feature or variable. svmtrain treats NaNs or empty strings in Training as missing values and ignores the corresponding rows of Group. Group Grouping variable, which can be a categorical, numeric, or logical vector, a cell vector of strings, or a character matrix with each row representing a class label. Each element of Group specifies the group of the corresponding row of Training. Group should divide Training into two groups. Group has the same number of elements as there are rows in Training. svmtrain treats each NaN, empty string, or 'undefined' in Group as a missing value, and ignores the corresponding row of Training.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'autoscale'

Boolean specifying whether svmtrain automatically centers the data points at their mean, and scales them to have unit standard deviation, before training.

Default: true

'boxconstraint'

Value of the box constraint C for the soft margin. C can be a scalar, or a vector of the same length as the training data.

If C is a scalar, it is automatically rescaled by N/(2*N1) for the data points of group one and by N/(2*N2) for the data points of group two, where N1 is the number of elements in group one, N2 is the number of elements in group two, and N = N1 + N2. This rescaling is done to take into account unbalanced groups, that is cases where N1 and N2 have very different values.

If C is an array, then each array element is taken as a box constraint for the data point with the same index.

Default: 1

'kernelcachelimit'

Value that specifies the size of the kernel matrix cache for the SMO training method. The algorithm keeps a matrix with up to kernelcachelimit × kernelcachelimit double-precision, floating-point numbers in memory.

Default: 5000

'kernel_function'

Kernel function svmtrain uses to map the training data into kernel space. The default kernel function is the dot product. The kernel function can be one of the following strings or a function handle:

• 'linear' — Linear kernel, meaning dot product.

• 'polynomial' — Polynomial kernel (default order 3). Specify another order with the polyorder name-value pair.

• 'rbf' — Gaussian Radial Basis Function kernel with a default scaling factor, sigma, of 1. Specify another value for sigma with the rbf_sigma name-value pair.

• 'mlp' — Multilayer Perceptron kernel with default scale [1 –1]. Specify another scale with the mlp_params name-value pair.

• @kfun — Function handle to a kernel function. A kernel function must be of the form

`function K = kfun(U, V)`

The returned value, K, is a matrix of size M-by-N, where U and V have M and N rows respectively.

If kfun has extra parameters, include the extra parameters via an anonymous function. For example, suppose that your kernel function is:

```function k = kfun(u,v,p1,p2)
k = tanh(p1*(u*v')+p2);```

Set values for p1 and p2, and then use an anonymous function:

`@(u,v) kfun(u,v,p1,p2)`

Default: 'linear'

'kktviolationlevel'

Value that specifies the fraction of variables allowed to violate the Karush-Kuhn-Tucker (KKT) conditions for the SMO training method. Set any value in [0,1). For example, if you set kktviolationlevel to 0.05, then 5% of the variables are allowed to violate the KKT conditions.

 Tip   Set this option to a positive value to help the algorithm converge if it is fluctuating near a good solution.

Default: 0

'method'

Method used to find the separating hyperplane. Options are:

• 'QP' — Quadratic programming (requires an Optimization Toolbox™ license). The classifier is a 2-norm soft-margin support vector machine. Give quadratic programming options with the options name-value pair, and create options with optimset.

• 'SMO' — Sequential Minimal Optimization. Give SMO options with the options name-value pair, and create options with statset.

• 'LS' — Least squares.

Default: SMO

'mlp_params'

Parameters of the Multilayer Perceptron (mlp) kernel. The mlp kernel requires two parameters, [P1 P2]. The kernel K = tanh(P1*U*V' + P2), where P1  > 0 and P2 < 0.

Default: [1 –1]

'options'

Options structure for training.

• When you set 'method' to 'SMO' (default), create the options structure using statset. Options are:

 Display String that specifies the level of information about the optimization iterations that is displayed as the algorithm runs. Choices are:off (default) — Reports nothing.iter — Reports every 500 iterations.final — Reports only when the algorithm finishes. MaxIter Integer that specifies the maximum number of iterations of the main loop. If this limit is exceeded before the algorithm converges, then the algorithm stops and returns an error. Default is 15000.

The other name-value pairs that relate specifically to the 'SMO' method are kernelcachelimit, kktviolationlevel, and tolkkt.

• When you set method to 'QP', create the options structure using optimset. For details of applicable option choices, see quadprog options. SVM uses a convex quadratic program, so you can choose the 'interior-point-convex' quadprog algorithm. In limited testing, the 'interior-point-convex' algorithm was the best quadprog option for svmtrain, in both speed and memory utilization.

'polyorder'

Order of the polynomial kernel.

Default: 3

'rbf_sigma'

Scaling factor (sigma) in the radial basis function kernel.

Default: 1

'showplot'

Boolean indicating whether to plot the grouped data and separating line. Creates a plot only when the data has two columns (features).

Default: false

'tolkkt'

Value that specifies the tolerance with which the Karush-Kuhn-Tucker (KKT) conditions are checked for the SMO training method. For a definition of KKT conditions, see Karush-Kuhn-Tucker (KKT) Conditions.

Default: 1e-3

## Output Arguments

 SVMStruct Structure containing information about the trained SVM classifier in the following fields: SupportVectors — Matrix of data points with each row corresponding to a support vector in the normalized data space. This matrix is a subset of the Training input data matrix, after normalization has been applied according to the 'AutoScale' argument. Alpha — Vector of weights for the support vectors. The sign of the weight is positive for support vectors belonging to the first group, and negative for the second group.Bias — Intercept of the hyperplane that separates the two groups in the normalized data space (according to the 'AutoScale' argument). KernelFunction — Handle to the function that maps the training data into kernel space.KernelFunctionArgs — Cell array of any additional arguments required by the kernel function.GroupNames — Categorical, numeric, or logical vector, a cell vector of strings, or a character matrix with each row representing a class label. Specifies the group identifiers for the support vectors. It has the same number of elements as there are rows in SupportVectors. Each element specifies the group to which the corresponding row in SupportVectors belongs.SupportVectorIndices — Vector of indices that specify the rows in Training, the training data, that were selected as support vectors after the data was normalized, according to the AutoScale argument.ScaleData — Field containing normalization factors. When 'AutoScale' is set to false, it is empty. When AutoScale is set to true, it is a structure containing two fields:shift — Row vector of values. Each value is the negative of the mean across an observation in Training, the training data.scaleFactor — Row vector of values. Each value is 1 divided by the standard deviation of an observation in Training, the training data.Both svmtrain and svmclassify apply the scaling in ScaleData.FigureHandles — Vector of figure handles created by svmtrain when using the 'Showplot' argument.

## Examples

expand all

### Train an SVM Classifier

Find a line separating the Fisher iris data on versicolor and virginica species, according to the petal length and petal width measurements. These two species are in rows 51 and higher of the data set, and the petal length and width are the third and fourth columns.

```load fisheriris
xdata = meas(51:end,3:4);
group = species(51:end);
svmStruct = svmtrain(xdata,group,'ShowPlot',true);
```

expand all

### Karush-Kuhn-Tucker (KKT) Conditions

The Karush-Kuhn-Tucker (KKT) conditions are analogous to the condition that the gradient must be zero at a minimum, modified to take constraints into account. The difference is that the KKT conditions hold for constrained problems. The KKT conditions use the auxiliary Lagrangian function:

$L\left(x,\lambda \right)=f\left(x\right)+\sum {\lambda }_{g,i}{g}_{i}\left(x\right)+\sum {\lambda }_{h,i}{h}_{i}\left(x\right).$

Here f(x) is the objective function, g(x) is a vector of constraint functions g(x) ≤ 0, and h(x) is a vector of constraint functions h(x) = 0. The vector λ, which is the concatenation of λg and λh, is the Lagrange multiplier vector. Its length is the total number of constraints.

The KKT conditions are:

### Tips

• To classify new data, use the result of training, SVMStruct, with the svmclassify function.

### Algorithms

The svmtrain function uses an optimization method to identify support vectors si, weights αi, and bias b that are used to classify vectors x according to the following equation:

$c=\sum _{i}{\alpha }_{i}k\left({s}_{i},x\right)+b,$

where k is a kernel function. In the case of a linear kernel, k is the dot product. If c ≥ 0, then x is classified as a member of the first group, otherwise it is classified as a member of the second group.

### Memory Usage and Out of Memory Error

When you set 'Method' to 'QP', the svmtrain function operates on a data set containing N elements, and it creates an (N+1)-by-(N+1) matrix to find the separating hyperplane. This matrix needs at least 8*(n+1)^2 bytes of contiguous memory. If this size of contiguous memory is not available, the software displays an "out of memory" error message.

When you set 'Method' to 'SMO' (default), memory consumption is controlled by the kernelcachelimit option. The SMO algorithm stores only a submatrix of the kernel matrix, limited by the size specified by the kernelcachelimit option. However, if the number of data points exceeds the size specified by the kernelcachelimit option, the SMO algorithm slows down because it has to recalculate the kernel matrix elements.

When using svmtrain on large data sets, and you run out of memory or the optimization step is very time consuming, try either of the following:

• Use a smaller number of samples and use cross-validation to test the performance of the classifier.

• Set 'Method' to 'SMO', and set the kernelcachelimit option as large as your system permits.

## References

[1] Kecman, V., Learning and Soft Computing, MIT Press, Cambridge, MA. 2001.

[2] Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., and Vandewalle, J., Least Squares Support Vector Machines, World Scientific, Singapore, 2002.

[3] Scholkopf, B., and Smola, A.J., Learning with Kernels, MIT Press, Cambridge, MA. 2002.

[4] Cristianini, N., and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, First Edition (Cambridge: Cambridge University Press). http://www.support-vector.net/