This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.


Train support vector machine classifier

svmtrain will be removed in a future release. See fitcsvm, ClassificationSVM, and CompactClassificationSVM instead.


SVMStruct = svmtrain(Training,Group)
SVMStruct = svmtrain(Training,Group,Name,Value)


SVMStruct = svmtrain(Training,Group) returns a structure, SVMStruct, containing information about the trained support vector machine (SVM) classifier.

SVMStruct = svmtrain(Training,Group,Name,Value) returns a structure with additional options specified by one or more Name,Value pair arguments.

Input Arguments


Matrix of training data, where each row corresponds to an observation or replicate, and each column corresponds to a feature or variable. svmtrain treats NaNs or empty character vectors in Training as missing values and ignores the corresponding rows of Group.


Grouping variable, which can be a categorical, numeric, or logical vector, a cell array of character vectors, or a character matrix with each row representing a class label. Each element of Group specifies the group of the corresponding row of Training. Group should divide Training into two groups. Group has the same number of elements as there are rows in Training. svmtrain treats each NaN, empty character vector, or 'undefined' in Group as a missing value, and ignores the corresponding row of Training.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.


Boolean specifying whether svmtrain automatically centers the data points at their mean, and scales them to have unit standard deviation, before training.

Default: true


Value of the box constraint C for the soft margin. C can be a scalar, or a vector of the same length as the training data.

If C is a scalar, it is automatically rescaled by N/(2*N1) for the data points of group one and by N/(2*N2) for the data points of group two, where N1 is the number of elements in group one, N2 is the number of elements in group two, and N = N1 + N2. This rescaling is done to take into account unbalanced groups, that is cases where N1 and N2 have very different values.

If C is an array, then each array element is taken as a box constraint for the data point with the same index.

Default: 1


Value that specifies the size of the kernel matrix cache for the SMO training method. The algorithm keeps a matrix with up to kernelcachelimit × kernelcachelimit double-precision, floating-point numbers in memory.

Default: 5000


Kernel function svmtrain uses to map the training data into kernel space. The default kernel function is the dot product. The kernel function can be one of the following character vectors or a function handle:

  • 'linear' — Linear kernel, meaning dot product.

  • 'quadratic' — Quadratic kernel.

  • 'polynomial' — Polynomial kernel (default order 3). Specify another order with the polyorder name-value pair.

  • 'rbf' — Gaussian Radial Basis Function kernel with a default scaling factor, sigma, of 1. Specify another value for sigma with the rbf_sigma name-value pair.

  • 'mlp' — Multilayer Perceptron kernel with default scale [1 –1]. Specify another scale with the mlp_params name-value pair.

  • @kfun — Function handle to a kernel function. A kernel function must be of the form

    function K = kfun(U, V)

    The returned value, K, is a matrix of size M-by-N, where U and V have M and N rows respectively.

    If kfun has extra parameters, include the extra parameters via an anonymous function. For example, suppose that your kernel function is:

    function k = kfun(u,v,p1,p2)
    k = tanh(p1*(u*v')+p2);

    Set values for p1 and p2, and then use an anonymous function:

    @(u,v) kfun(u,v,p1,p2)

Default: 'linear'


Value that specifies the fraction of variables allowed to violate the Karush-Kuhn-Tucker (KKT) conditions for the SMO training method. Set any value in [0,1). For example, if you set kktviolationlevel to 0.05, then 5% of the variables are allowed to violate the KKT conditions.

    Tip   Set this option to a positive value to help the algorithm converge if it is fluctuating near a good solution.

For more information on KKT conditions, see Cristianini and Shawe-Taylor [4].

Default: 0


Method used to find the separating hyperplane. Options are:

  • 'QP' — Quadratic programming (requires an Optimization Toolbox™ license). The classifier is a 2-norm soft-margin support vector machine. Give quadratic programming options with the options name-value pair, and create options with optimset.

  • 'SMO' — Sequential Minimal Optimization. Give SMO options with the options name-value pair, and create options with statset.

  • 'LS' — Least squares.

Default: SMO


Parameters of the Multilayer Perceptron (mlp) kernel. The mlp kernel requires two parameters, [P1 P2]. The kernel K = tanh(P1*U*V' + P2), where P1  > 0 and P2 < 0.

Default: [1 –1]


Options structure for training.

  • When you set 'method' to 'SMO' (default), create the options structure using statset. Options are:


    Character vector that specifies the level of information about the optimization iterations that is displayed as the algorithm runs. Choices are:

    • off (default) — Reports nothing.

    • iter — Reports every 500 iterations.

    • final — Reports only when the algorithm finishes.


    Integer that specifies the maximum number of iterations of the main loop. If this limit is exceeded before the algorithm converges, then the algorithm stops and returns an error. Default is 15000.

    The other name-value pairs that relate specifically to the 'SMO' method are kernelcachelimit, kktviolationlevel, and tolkkt.

  • When you set method to 'QP', create the options structure using optimset. For details of applicable option choices, see quadprog options. SVM uses a convex quadratic program, so you can choose the 'interior-point-convex' quadprog algorithm. In limited testing, the 'interior-point-convex' algorithm was the best quadprog option for svmtrain, in both speed and memory utilization.


Order of the polynomial kernel.

Default: 3


Scaling factor (sigma) in the radial basis function kernel.

Default: 1


Boolean indicating whether to plot the grouped data and separating line. Creates a plot only when the data has two columns (features).

Default: false


Value that specifies the tolerance with which the Karush-Kuhn-Tucker (KKT) conditions are checked for the SMO training method. For a definition of KKT conditions, see Karush-Kuhn-Tucker (KKT) Conditions.

Default: 1e-3

Output Arguments


Structure containing information about the trained SVM classifier in the following fields:

  • SupportVectors — Matrix of data points with each row corresponding to a support vector in the normalized data space. This matrix is a subset of the Training input data matrix, after normalization has been applied according to the 'AutoScale' argument.

  • Alpha — Vector of weights for the support vectors. The sign of the weight is positive for support vectors belonging to the first group, and negative for the second group.

  • Bias — Intercept of the hyperplane that separates the two groups in the normalized data space (according to the 'AutoScale' argument).

  • KernelFunction — Handle to the function that maps the training data into kernel space.

  • KernelFunctionArgs — Cell array of any additional arguments required by the kernel function.

  • GroupNames — Categorical, numeric, or logical vector, a cell array of character vectors, or a character matrix with each row representing a class label. Specifies the group identifiers for the support vectors. It has the same number of elements as there are rows in SupportVectors. Each element specifies the group to which the corresponding row in SupportVectors belongs.

  • SupportVectorIndices — Vector of indices that specify the rows in Training, the training data, that were selected as support vectors after the data was normalized, according to the AutoScale argument.

  • ScaleData — Field containing normalization factors. When 'AutoScale' is set to false, it is empty. When AutoScale is set to true, it is a structure containing two fields:

    • shift — Row vector of values. Each value is the negative of the mean across an observation in Training, the training data.

    • scaleFactor — Row vector of values. Each value is 1 divided by the standard deviation of an observation in Training, the training data.

    Both svmtrain and svmclassify apply the scaling in ScaleData.

  • FigureHandles — Vector of figure handles created by svmtrain when using the 'Showplot' argument.


collapse all

Find a line separating the Fisher iris data on versicolor and virginica species, according to the petal length and petal width measurements. These two species are in rows 51 and higher of the data set, and the petal length and width are the third and fourth columns.

load fisheriris
xdata = meas(51:end,3:4);
group = species(51:end);
svmStruct = svmtrain(xdata,group,'ShowPlot',true);

More About

collapse all

Karush-Kuhn-Tucker (KKT) Conditions

The Karush-Kuhn-Tucker (KKT) conditions are analogous to the condition that the gradient must be zero at a minimum, modified to take constraints into account. The difference is that the KKT conditions hold for constrained problems. The KKT conditions use the auxiliary Lagrangian function:


Here f(x) is the objective function, g(x) is a vector of constraint functions g(x) ≤ 0, and h(x) is a vector of constraint functions h(x) = 0. The vector λ, which is the concatenation of λg and λh, is the Lagrange multiplier vector. Its length is the total number of constraints.

The KKT conditions are:

xL(x,λ)=0λg,igi(x)=0 ig(x)0h(x)=0λg,i0.

For more information, see Karush-Kuhn-Tucker conditions.


  • To classify new data, use the result of training, SVMStruct, with the svmclassify function.


The svmtrain function uses an optimization method to identify support vectors si, weights αi, and bias b that are used to classify vectors x according to the following equation:


where k is a kernel function. In the case of a linear kernel, k is the dot product. If c ≥ 0, then x is classified as a member of the first group, otherwise it is classified as a member of the second group.

Memory Usage and Out of Memory Error

When you set 'Method' to 'QP', the svmtrain function operates on a data set containing N elements, and it creates an (N+1)-by-(N+1) matrix to find the separating hyperplane. This matrix needs at least 8*(n+1)^2 bytes of contiguous memory. If this size of contiguous memory is not available, the software displays an "out of memory" error message.

When you set 'Method' to 'SMO' (default), memory consumption is controlled by the kernelcachelimit option. The SMO algorithm stores only a submatrix of the kernel matrix, limited by the size specified by the kernelcachelimit option. However, if the number of data points exceeds the size specified by the kernelcachelimit option, the SMO algorithm slows down because it has to recalculate the kernel matrix elements.

When using svmtrain on large data sets, and you run out of memory or the optimization step is very time consuming, try either of the following:

  • Use a smaller number of samples and use cross-validation to test the performance of the classifier.

  • Set 'Method' to 'SMO', and set the kernelcachelimit option as large as your system permits.


[1] Kecman, V., Learning and Soft Computing, MIT Press, Cambridge, MA. 2001.

[2] Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., and Vandewalle, J., Least Squares Support Vector Machines, World Scientific, Singapore, 2002.

[3] Scholkopf, B., and Smola, A.J., Learning with Kernels, MIT Press, Cambridge, MA. 2002.

[4] Cristianini, N., and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, First Edition (Cambridge: Cambridge University Press).

Introduced in R2013a

Was this topic helpful?