Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Train binary support vector machine classifier

`fitcsvm`

trains or cross-validates
a support vector machine (SVM) model for two-class (binary) classification
on a low- through moderate-dimensional predictor data set. `fitcsvm`

supports
mapping the predictor data using kernel functions, and supports SMO,
ISDA, or * L*1 soft-margin minimization via quadratic
programming for objective-function minimization.

To train a linear SVM model for binary classification on a high-dimensional
data set, that is, data sets that include many predictor variables,
use `fitclinear`

instead.

For multiclass learning by combining binary SVM models, use
error-correcting output codes (ECOC). For more details, see `fitcecoc`

.

To train an SVM regression model, see `fitrsvm`

for
low- through moderate-dimensional predictor data sets, or `fitrlinear`

for high-dimensional data sets.

`Mdl = fitcsvm(Tbl,ResponseVarName)`

`Mdl = fitcsvm(Tbl,formula)`

`Mdl = fitcsvm(Tbl,Y)`

`Mdl = fitcsvm(X,Y)`

`Mdl = fitcsvm(___,Name,Value)`

returns
a support
vector machine classifier `Mdl`

= fitcsvm(`Tbl`

,`ResponseVarName`

)`Mdl`

trained using
the sample data contained in a table (`Tbl`

). `ResponseVarName`

is
the name of the variable in `Tbl`

that contains the
class labels for one- or two-class classification.

returns
a support vector machine classifier with additional options specified
by one or more `Mdl`

= fitcsvm(___,`Name,Value`

)`Name,Value`

pair arguments, using
any of the previous syntaxes. For example, you can specify the type
of cross-validation, the cost for misclassification, or the type of
score transformation function.

`fitcsvm`

trains SVM classifiers for one- or two-class learning applications. To train SVM classifiers using data with more than two classes, use`fitcecoc`

.`fitcsvm`

supports low- through moderate-dimensional data sets. For high-dimensional data set, use`fitclinear`

instead.

Unless your data set is large, always try to standardize the predictors (see

`Standardize`

). Standardization makes predictors insensitive to the scales on which they are measured.It is good practice to cross-validate using the

`KFold`

name-value pair argument. The cross-validation results determine how well the SVM classifier generalizes.For one-class learning:

The default setting for the name-value pair argument

`Alpha`

can lead to long training times. To speed up training, set`Alpha`

to a vector mostly composed of`0`

s.Set the name-value pair argument

`Nu`

to a value closer to`0`

to yield fewer support vectors, and, therefore, a smoother, but crude decision boundary.

Sparsity in support vectors is a desirable property of an SVM classifier. To decrease the number of support vectors, set

`BoxConstraint`

to a large value. This action increases the training time.For optimal training time, set

`CacheSize`

as high as the memory limit on your computer allows.If you expect many fewer support vectors than observations in the training set, then you can significantly speed up convergence by shrinking the active set using the name-value pair argument

`'ShrinkagePeriod'`

. It is good practice to use`'ShrinkagePeriod',1000`

.Duplicate observations that are far from the decision boundary do not affect convergence. However, just a few duplicate observations that occur near the decision boundary can slow down convergence considerably. To speed up convergence, specify

`'RemoveDuplicates',true`

if:Your data set contains many duplicate observations.

You suspect that a few duplicate observations fall near the decision boundary.

However, to maintain the original data set during training,

`fitcsvm`

must temporarily store separate data sets: the original and one without the duplicate observations. Therefore, if you specify`true`

for data sets containing few duplicates, then`fitcsvm`

consumes close to double the memory of the original data.

`NaN`

,`<undefined>`

, and empty character vector (`''`

) values indicate missing values.`fitcsvm`

removes entire rows of data corresponding to a missing response. When computing total weights (see the next bullets),`fitcsvm`

ignores any weight corresponding to an observation with at least one missing predictor. This action can lead to unbalanced prior probabilities in balanced-class problems. Consequently, observation box constraints might not equal`BoxConstraint`

.`fitcsvm`

removes observations that have zero weight or prior probability.For two-class learning, if you specify the cost matrix $$\mathcal{C}$$ (see

`Cost`

), then the software updates the class prior probabilities(see*p*`Prior`

) toby incorporating the penalties described in $$\mathcal{C}$$.*p*_{c}Specifically,

`fitcsvm`

:Computes $${p}_{c}^{\ast}=p\prime \mathcal{C}.$$

Normalizes

*p*_{c}^{*}so that the updated prior probabilities sum 1:$${p}_{c}=\frac{1}{{\displaystyle \sum _{j=1}^{K}{p}_{c,j}^{\ast}}}{p}_{c}^{\ast}.$$

is the number of classes.*K*Resets the cost matrix to the default:

$$\mathcal{C}=\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right].$$

Removes observations from the training data corresponding to classes with zero prior probability.

For two-class learning,

`fitcsvm`

normalizes all observation weights (see`Weights`

) to sum to 1. Then, renormalizes the normalized weights to sum up to the updated, prior probability of the class to which the observation belongs. That is, the total weight for observationin class*j*is*k*$${w}_{j}^{\ast}=\frac{{w}_{j}}{{\displaystyle \sum _{\forall j\in \text{Class}k}{w}_{j}}}{p}_{c,k}.$$

is the normalized weight for observation*w*_{j};*j**p*_{c,k}is the updated prior probability of class(see previous bullet).*k*For two-class learning,

`fitcsvm`

assigns a box constraint to each observation in the training data. The formula for the box constraint of observationis*j*$${C}_{j}=n{C}_{0}{w}_{j}^{\ast}.$$

is the training sample size,*n**C*_{0}is the initial box constraint (see`BoxConstraint`

), and $${w}_{j}^{\ast}$$ is the total weight of observation(see previous bullet).*j*If you set

`'Standardize',true`

and any of`'Cost'`

,`'Prior'`

, or`'Weights'`

, then`fitcsvm`

standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is,`fitcsvm`

standardizes predictor(*j*) using*x*_{j}$${x}_{j}^{\ast}=\frac{{x}_{j}-{\mu}_{j}^{\ast}}{{\sigma}_{j}^{\ast}}.$$

$${\mu}_{j}^{\ast}=\frac{1}{{\displaystyle \sum _{k}{w}_{k}^{\ast}}}{\displaystyle \sum _{k}{w}_{k}^{\ast}{x}_{jk}}.$$

is observation*x*_{jk}(row) of predictor*k*(column).*j*$${\left({\sigma}_{j}^{\ast}\right)}^{2}=\frac{{v}_{1}}{{v}_{1}^{2}-{v}_{2}}{\displaystyle \sum _{k}{w}_{k}^{\ast}{\left({x}_{jk}-{\mu}_{j}^{\ast}\right)}^{2}}.$$

$${v}_{1}={\displaystyle \sum _{j}{w}_{j}^{\ast}}.$$

$${v}_{2}={\displaystyle \sum _{j}{\left({w}_{j}^{\ast}\right)}^{2}}.$$

Let

`p`

be the proportion of outliers that you expect in the training data. If you set`'OutlierFraction',p`

, then:For one-class learning, the software trains the bias term such that 100

`p`

% of the observations in the training data have negative scores.The software implements

*robust learning*for two-class learning. In other words, the software attempts to remove 100`p`

% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.

The

`PredictorNames`

property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then`PredictorNames`

is a 1-by-3 cell array of character vectors containing the original names of the predictor variables.The

`ExpandedPredictorNames`

property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then`ExpandedPredictorNames`

is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables.Similarly, the

`Beta`

property stores one beta coefficient for each predictor, including the dummy variables.The

`SupportVectors`

property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there aresupport vectors and three predictors, one of which is a categorical variable with three levels. Then*m*`SupportVectors`

is an-by-5 matrix.*n*The

`X`

property stores the training data as originally input. It does not include the dummy variables. When the input is a table,`X`

contains only the columns used as predictors.

For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.

For a variable having

ordered levels, the software creates*k*dummy variables. The*k*– 1th dummy variable is*j**-1*for levels up to, and*j**+1*for levelsthrough*j*+ 1.*k*The names of the dummy variables stored in the

`ExpandedPredictorNames`

property indicate the first level with the value*+1*. The software storesadditional predictor names for the dummy variables, including the names of levels 2, 3, ...,*k*– 1.*k*

All solvers implement

1 soft-margin minimization.*L*`fitcsvm`

and`svmtrain`

use, among other algorithms, SMO for optimization. The software implements SMO differently between the two functions, but numerical studies show that there is sensible agreement in the results.For one-class learning, the software estimates the Lagrange multipliers,

*α*_{1},...,, such that*α*_{n}$$\sum _{j=1}^{n}{\alpha}_{j}}=n\nu .$$

[1] Christianini, N., and J. C. Shawe-Taylor. *An
Introduction to Support Vector Machines and Other Kernel-Based Learning
Methods*. Cambridge, UK: Cambridge University Press, 2000.

[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. "Working
set selection using second order information for training support
vector machines." *Journal of Machine Learning Research*,
Vol 6, 2005, pp. 1889–1918.

[3] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning*, Second Edition. NY:
Springer, 2008.

[4] Kecman V., T. -M. Huang, and M. Vogt. "Iterative
Single Data Algorithm for Training Kernel Machines from Huge Data
Sets: Theory and Performance." In *Support Vector
Machines: Theory and Applications*. Edited by Lipo Wang,
255–274. Berlin: Springer-Verlag, 2005.

[5] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor,
A. J. Smola, and R. C. Williamson. "Estimating the Support
of a High-Dimensional Distribution." *Neural Comput*.,
Vol. 13, Number 7, 2001, pp. 1443–1471.

[6] Scholkopf, B., and A. Smola. *Learning with
Kernels: Support Vector Machines, Regularization, Optimization and
Beyond, Adaptive Computation and Machine Learning*. Cambridge,
MA: The MIT Press, 2002.

`ClassificationPartitionedModel`

| `ClassificationSVM`

| `CompactClassificationSVM`

| `fitcecoc`

| `fitclinear`

| `fitSVMPosterior`

| `predict`

| `quadprog`

| `rng`

Was this topic helpful?