Documentation

ClassificationSVM class

Superclasses: CompactClassificationSVM

Support vector machine for binary classification

Description

ClassificationSVM is a support vector machine classifier for one- or two-class learning. To train a ClassificationSVM classifier, use fitcsvm.

Trained ClassificationSVM classifiers store the training data, parameter values, prior probabilities, support vectors, and algorithmic implementation information. You can use these classifiers to:

  • Estimate resubstitution predictions. For details, see resubPredict.

  • Predict labels or posterior probabilities for new data. For details, see predict.

Construction

SVMModel = fitcsvm(TBL,ResponseVarName) returns an SVM classifier (SVMModel) trained using the sample data contained in the table TBL. ResponseVarName is the name of the variable in TBL that contains the class labels for one- or two-class classification. For details, see fitcsvm.

SVMModel = fitcsvm(TBL,formula) returns an SVM classifer trained using the sample data contained in a table (TBL). formula is a formula string that identifies the response and predictor variables in TBL that are used for training. For details, see fitcsvm.

SVMModel = fitcsvm(TBL,Y) returns an SVM classifer trained using the predictor variables in table TBL and class labels in vector Y. For details, see fitcsvm.

SVMModel = fitcsvm(X,Y) returns an SVM classifier trained using the predictors in the matrix X and class labels in the vector Y for one- or two-class classification. For details, see fitcsvm.

SVMModel = fitcsvm(___,Name,Value) returns a trained SVM classifier with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, you can specify the type of cross validation, the cost for misclassification, or the type of score transformation function. For name-value pair argument details, see fitcsvm.

If you set one of the following five options, then SVMModel is a ClassificationPartitionedModel model: 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Otherwise, SVMModel is a ClassificationSVM classifier.

Input Arguments

expand all

TBL — Sample datatable

Sample data used to train the model, specified as a table. Each row of TBL corresponds to one observation, and each column corresponds to one predictor variable. Optionally, TBL can contain one additional column for the response variable. Multi-column variables and cell arrays other than cell arrays of strings are not allowed.

If TBL contains the response variable, and you want to use all remaining variables in TBL as predictors, then specify the response variable using ResponseVarName.

If TBL contains the response variable, and you want to use only a subset of the remaining variables in TBL as predictors, then specify a formula string using formula.

If TBL does not contain the response variable, then specify a response variable using Y. The length of response variable and the number of rows of TBL must be equal.

Data Types: table

ResponseVarName — Response variable namename of a variable in TBL

Response variable name, specified as the name of a variable in TBL.

You must specify ResponseVarName as a string. For example, if the response variable Y is stored as TBL.Y, then specify it as 'Y'. Otherwise, the software treats all columns of TBL, including Y, as predictors when training the model.

The response variable must be a categorical or character array, logical or numeric vector, or cell array of strings. If the response variable is a character array, then each element must correspond to one row of the array.

It is good practice to specify the order of the classes using the ClassNames name-value pair argument.

formula — Response and predictor variables to use in model trainingstring in the form of 'Y~X1+X2+X3'

Response and predictor variables to use in model training, specified as a string in the form of 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables.

To specify a subset of variables in TBL as predictors for training the model, use a formula string. If you specify a formula string, then any variables in TBL that do not appear in formula are not used to train the model.

X — Predictor datamatrix of numeric values

Predictor data to which the SVM classifier is trained, specified as a matrix of numeric values.

Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one predictor.

The length of Y and the number of rows of X must be equal.

To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value pair argument.

Data Types: double | single

Y — Class labelscategorical array | character array | logical vector | vector of numeric values | cell array of strings

Class labels to which the SVM classifier is trained, specified as a categorical or character array, logical or numeric vector, or cell array of strings.

If Y is a character array, then each element must correspond to one row of the array.

The length of Y and the number of rows of X must be equal.

It is good practice to specify the order of the classes using the ClassNames name-value pair argument.

To specify the response variable name, use the ResponseName name-value pair argument.

    Note:   The software treats NaN, empty string (''), and <undefined> elements as missing values. If a row of X or an element of Y contains at least one NaN, then the software removes those rows and elements from both arguments. Such deletion decreases the effective training or cross-validation sample size.

Properties

Alpha

Numeric vector of trained classifier coefficients from the dual problem (i.e., the estimated Lagrange multipliers). Alpha has length equal to the number of support vectors in the trained classifier (i.e., sum(SVMModel.IsSupportVector)).

Beta

Numeric vector of linear predictor coefficients. Beta has length equal to the number of predictors used to train the model.

If your predictor data contains categorical variables, then the software uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable. Beta stores one value for each predictor variable, including the dummy variables. For example, if there are three predictors, one of which is a categorical variable with three levels, then Beta is a numeric vector containing five values.

If KernelParameters.Function is 'linear', then the software estimates the classification score for the observation x using

f(x)=(x/s)β+b.

SVMModel stores β, b, and s in the properties Beta, Bias, and KernelParameters.Scale, respectively.

If KernelParameters.Function is not 'linear', then Beta is empty ([]).

Bias

Scalar corresponding to the trained classifier bias term.

BoxConstraints

Numeric vector of box constraints.

BoxConstraints has length equal to the number of observations (i.e., size(SVMModel.X,1)).

CacheInfo

Structure array containing:

  • The cache size (in MB) that the software reserves to train the SVM classifier (SVMModel.CacheInfo.Size). To set the cache size to CacheSize MB, set the fitcsvm name-value pair argument to 'CacheSize',CacheSize.

  • The caching algorithm that the software uses during optimization (SVMModel.CacheInfo.Algorithm). Currently, the only available caching algorithm is Queue. You cannot set the caching algorithm.

CategoricalPredictors

Indices of categorical predictors, stored as a numeric vector. CategoricalPredictors contains index values corresponding to the columns of X that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]).

ClassNames

List of elements in Y with duplicates removed. ClassNames has the same data type as the data in the argument Y, and therefore can be a categorical or character array, logical or numeric vector, or cell array of strings.

ConvergenceInfo

Structure array containing convergence information.

FieldDescription
ConvergedLogical flag indicating whether the algorithm converged (1 indicates convergence)
ReasonForConvergenceString indicating the criterion the software uses to detect convergence
GapScalar feasibility gap between the dual and primal objective functions
GapToleranceScalar feasibility gap tolerance. Set this tolerance to, e.g., gt, using the name-value pair argument 'GapTolerance',gt of fitcsvm.
DeltaGradientScalar-attained gradient difference between upper and lower violators
DeltaGradientToleranceScalar tolerance for gradient difference between upper and lower violators. Set this tolerance to, e.g., dgt, using the name-value pair argument 'DeltaGradientTolerance',dgt of fitcsvm.
LargestKKTViolationMaximal, scalar Karush-Kuhn-Tucker (KKT) violation value
KKTToleranceScalar tolerance for the largest KKT violation. Set this tolerance to, e.g., kktt, using the name-value pair argument 'KKTTolerance',kktt of fitcsvm.
HistoryStructure array containing convergence information at set optimization iterations. The fields are:
  • NumIterations: numeric vector of iteration indices for which the software records convergence information

  • Gap: numeric vector of Gap values at the iterations

  • DeltaGradient numeric vector of DeltaGradient values at the iterations

  • LargestKKTViolation: numeric vector of LargestKKTViolation values at the iterations

  • NumSupportVectors: numeric vector indicating the number of support vectors at the iterations

  • Objective: numeric vector of Objective values at the iterations

ObjectiveScalar value of the dual objective function

Cost

Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i.

During training, the software updates the prior probabilities by incorporating the penalties described in the cost matrix. Therefore,

  • For two-class learning, Cost always has this form: Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames.

  • For one-class learning, Cost = 0.

This property is read-only. For more details, see Algorithms.

ExpandedPredictorNames

Expanded predictor names, stored as a cell array of strings.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Gradient

Numeric vector of training data gradient values. Gradient has length equal to the number of observations (i.e., size(SVMModel.X,1)).

IsSupportVector

Logical vector indicating whether a corresponding row in the predictor data matrix is a support vector. IsSupportVector has length equal to the number of observations (i.e., size(SVMModel.X,1)).

KernelParameters

Structure array containing the kernel name and parameter values.

To display the values of KernelParameters, use dot notation, e.g., SVMModel.KernelParameters.Scale displays the scale parameter value.

The software accepts KernelParameters as inputs, and does not modify them. Alter KernelParameters by setting the appropriate name-value pair arguments when you train the SVM classifier using fitcsvm.

ModelParameters

Object containing parameter values, e.g., the name-value pair argument values, used to train the SVM classifier. ModelParameters does not contain estimated parameters.

Access fields of ModelParameters using dot notation. For example, access the initial values for estimating Alpha using SVMModel.ModelParameters.Alpha.

Mu

Numeric vector of predictor means.

If you specify 'Standardize',1 or 'Standardize',true when you train an SVM classifier using fitcsvm, then Mu has length equal to the number of predictors.

If your predictor data contains categorical variables, then the software uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable. Mu stores one value for each predictor variable, including the dummy variables. However, the software does not standardize the columns that contain categorical variables.

If 'Standardize' is false or 0, then Mu is an empty vector ([]).

NumIterations

Positive integer indicating the number of iterations required by the optimization routine to attain convergence.

To set a limit on the number of iterations to, e.g., k, specify the name-value pair argument 'IterationLimit',k of fitcsvm.

Nu

Positive scalar representing the ν parameter for one-class learning.

NumObservations

Numeric scalar representing the number of observations in the training data. If the input arguments X or Y contain missing values, then NumObservations is less than the length of Y.

OutlierFraction

Scalar indicating the expected proportion of outliers in the training data.

PredictorNames

Cell array of strings containing the predictor names, in the order that they appear in X.

Prior

Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the elements of SVMModel.ClassNames.

For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix.

This property is read-only. For more details, see Algorithms.

ResponseName

String describing the response variable Y.

ScoreTransform

String representing a built-in transformation function, or a function handle for transforming predicted classification scores.

To change the score transformation function to, e.g., function, use dot notation.

  • For a built-in function, enter a string.

    SVMModel.ScoreTransform = 'function';

    This table contains the available, built-in functions.

    StringFormula
    'doublelogit'1/(1 + e–2x)
    'invlogit'log(x / (1–x))
    'ismax'Set the score for the class with the largest score to 1, and scores for all other classes to 0.
    'logit'1/(1 + ex)
    'none' or 'identity'x (no transformation)
    'sign'–1 for x < 0
    0 for x = 0
    1 for x > 0
    'symmetric'2x – 1
    'symmetriclogit'2/(1 + ex) – 1
    'symmetricismax'Set the score for the class with the largest score to 1, and scores for all other classes to -1.

  • For a MATLAB® function, or a function that you define, enter its function handle.

    SVMModel.ScoreTransform = @function;

    function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

ShrinkagePeriod

Nonnegative integer indicating the shrinkage period, i.e., number of iterations between reductions of the active set.

To set the shrinkage period to, e.g., sp, specify the name-value pair argument 'ShrinkagePeriod',sp of fitcsvm.

Sigma

Numeric vector of predictor standard deviations.

If you specify 'Standardize',1 or 'Standardize',true when you train the SVM classifier, then Sigma has length equal to the number of predictors.

If your predictor data contains categorical variables, then the software uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable. Sigma stores one value for each predictor variable, including the dummy variables. However, the software does not standardize the columns that contain categorical variables.

If 'Standardize' is false or 0, then Sigma is an empty vector ([]).

Solver

String indicating the solving routine that the software used to train the SVM classifier.

To set the solver to, e.g., solver, specify the name-value pair argument 'Solver',solver of fitcsvm.

SupportVectors

Matrix containing rows of X that the software considers the support vectors.

If you specify 'Standardize',1 or 'Standardize',true, then SupportVectors are the standardized rows of X.

SupportVectorLabels

Numeric vector of support vector class labels. SupportVectorLabels has length equal to the number of support vectors (i.e., sum(SVMModel.IsSupportVector)).

+1 indicates that the corresponding support vector is in the positive class (SVMModel.ClassNames{2}). -1 indicates that the corresponding support vector is in the negative class (SVMModel.ClassNames{1}).

W

Numeric vector of observation weights that the software used to train the SVM classifier.

The length of W is SVMModel.NumObservations.

fitcsvm normalizes Weights so that the elements of W within a particular class sum up to the prior probability of that class.

X

Numeric matrix of unstandardized predictor values that the software used to train the SVM classifier.

Each row of X corresponds to one observation, and each column corresponds to one variable.

The software excludes predictor data rows removed due to NaNs from X.

Y

Categorical or character array, logical or numeric vector, or cell array of strings representing the observed class labels used to train the SVM classifier. Y is the same data type as the input argument Y of fitcsvm.

Each row of Y represents the observed classification of the corresponding row of X.

The software excludes elements removed due to NaNs from Y.

Methods

compactCompact support vector machine classifier
crossvalCross-validated support vector machine classifier
fitPosteriorFit posterior probabilities
resubEdgeClassification edge for support vector machine classifiers by resubstitution
resubLossClassification loss for support vector machine classifiers by resubstitution
resubMarginClassification margins for support vector machine classifiers by resubstitution
resubPredictPredict support vector machine classifier resubstitution responses
resumeResume training support vector machine classifier

Inherited Methods

compareHoldoutCompare accuracies of two classification models using new data
discardSupportVectorsDiscard support vectors for linear support vector machine models
edgeClassification edge for support vector machine classifiers
fitPosteriorFit posterior probabilities
lossClassification error for support vector machine classifiers
marginClassification margins for support vector machine classifiers
predictPredict labels for support vector machine classifiers

Definitions

Box Constraint

A parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization).

If you increase the box constraint, then the SVM classifier assigns fewer support vectors. However, increasing the box constraint can lead to longer training times.

Gram Matrix

The Gram matrix of a set of n vectors {x1,..,xn; xjRp} is an n-by-n matrix with element (j,k) defined as G(xj,xk) = <ϕ(xj),ϕ(xk)>, an inner product of the transformed predictors using the kernel function ϕ.

For nonlinear SVM, the algorithm forms a Gram matrix using the predictor matrix columns. The dual formalization replaces the inner product of the predictors with corresponding elements of the resulting Gram matrix (called the "kernel trick"). Subsequently, nonlinear SVM operates in the transformed predictor space to find a separating hyperplane.

Karush-Kuhn-Tucker Complementarity Conditions

KKT complementarity conditions are optimization constraints required for optimal nonlinear programming solutions.

In SVM, the KKT complementarity conditions are

{αj[yjf(xj)1+ξj]=0ξj(Cαj)=0

for all j = 1,...,n, where f(xj)=ϕ(xj)β+b, ϕ is a kernel function (see Gram matrix), and ξj is a slack variable. If the classes are perfectly separable, then ξj = 0 for all j = 1,...,n.

One-Class Learning

One-class learning, or unsupervised SVM, aims at separating data from the origin in the high-dimensional, predictor space (not the original predictor space), and is an algorithm used for outlier detection.

The algorithm resembles that of SVM for binary classification. The objective is to minimize dual expression

0.5jkαjαkG(xj,xk)

with respect to α1,...,αn, subject to

αj=nν

and 0αj1 for all j = 1,...,n. G(xj,xk) is element (j,k) of the Gram matrix.

A small value of ν leads to fewer support vectors, and, therefore, a smooth, crude decision boundary. A large value of ν leads to more support vectors, and therefore, a curvy, flexible decision boundary. The optimal value of ν should be large enough to capture the data complexity and small enough to avoid overtraining. Also, 0 < ν ≤ 1.

For more details, see [5].

Support Vector

Support vectors are observations corresponding to strictly positive estimates of α1,...,αn.

SVM classifiers that yield fewer support vectors for a given training set are more desirable.

Support Vector Machines for Binary Classification

The SVM binary classification algorithm searches for an optimal hyperplane that separates the data into two classes. For separable classes, the optimal hyperplane maximizes a margin (space that does not contain any observations) surrounding itself, which creates boundaries for the positive and negative classes. For inseparable classes, the objective is the same, but the algorithm imposes a penalty on the length of the margin for every observation that is on the wrong side of its class boundary.

The linear SVM score function is

f(x)=xβ+b,

where:

  • x is an observation (corresponding to a row of X).

  • The vector β contains the coefficients that define an orthogonal vector to the hyperplane (corresponding to SVMModel.Beta). For separable data, the optimal margin length is 2/β.

  • b is the bias term (corresponding to SVMModel.Bias).

The root of f(x) for particular coefficients defines a hyperplane. For a particular hyperplane, f(z) is the distance from point z to the hyperplane.

The algorithm searches for the maximum margin length, while keeping observations in the positive (y = 1) and negative (y = –1) classes separate. Therefore:

  • For separable classes, the objective is to minimize β with respect to the β and b subject to yjf(xj) ≥ 1, for all j = 1,..,n. This is the primal formalization for separable classes.

  • For inseparable classes, the algorithm uses slack variables (ξj) to penalize the objective function for observations that cross the margin boundary for their class. ξj = 0 for observations that do not cross the margin boundary for their class, otherwise ξj ≥ 0.

    The objective is to minimize0.5β2+Cξj with respect to the β, b, and ξj subject to yjf(xj)1ξj and ξj0 for all j = 1,..,n, and for a positive scalar box constraint C. This is the primal formalization for inseparable classes.

The algorithm uses the Lagrange multipliers method to optimize the objective. This introduces n coefficients α1,...,αn (corresponding to SVMModel.Alpha). The dual formalizations for linear SVM are:

  • For separable classes, minimize

    0.5j=1nk=1nαjαkyjykxjxkj=1nαj

    with respect to α1,...,αn, subject to αjyj=0, αj ≥ 0 for all j = 1,...,n, and Karush-Kuhn-Tucker (KKT) complementarity conditions.

  • For inseparable classes, the objective is the same as for separable classes, except for the additional condition 0αjC for all j = 1,..,n.

The resulting score function is

f^(x)=j=1nα^jyjxxj+b^.

b^ is the estimate of the bias and α^j is the jth estimate of the vector α^, j = 1,...,n. Written this way, the score function is free of the estimate of β as a result of the primal formalization.

The SVM algorithm classifies a new observation, z using sign(f^(z)).

In some cases, there is a nonlinear boundary separating the classes. Nonlinear SVM works in a transformed predictor space to find an optimal, separating hyperplane.

The dual formalization for nonlinear SVM is

0.5j=1nk=1nαjαkyjykG(xj,xk)j=1nαj

with respect to α1,...,αn, subject to αjyj=0, 0αjC for all j = 1,..,n, and the KKT complementarity conditions.G(xk,xj) are elements of the Gram matrix. The resulting score function is

f^(x)=j=1nα^jyjG(x,xj)+b^.

For more details, see Understanding Support Vector Machines, [1], and [3].

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB documentation.

Examples

expand all

Train a Support Vector Machine Classifier

Load Fisher's iris data set. Remove the sepal lengths and widths, and all observed setosa irises.

load fisheriris
inds = ~strcmp(species,'setosa');
X = meas(inds,3:4);
y = species(inds);

Train an SVM classifier using the processed data set.

SVMModel = fitcsvm(X,y)
SVMModel = 

  ClassificationSVM
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'versicolor'  'virginica'}
           ScoreTransform: 'none'
          NumObservations: 100
                    Alpha: [24x1 double]
                     Bias: -14.4149
         KernelParameters: [1x1 struct]
           BoxConstraints: [100x1 double]
          ConvergenceInfo: [1x1 struct]
          IsSupportVector: [100x1 logical]
                   Solver: 'SMO'


The Command Window shows that SVMModel is a trained ClassificationSVM classifier and a property list. Display the properties of SVMModel, for example, to determine the class order, by using dot notation.

classOrder = SVMModel.ClassNames
classOrder = 

    'versicolor'
    'virginica'

The first class ('versicolor') is the negative class, and the second ('virginica') is the positive class. You can change the class order during training by using the 'ClassNames' name-value pair argument.

Plot a scatter diagram of the data and circle the support vectors.

sv = SVMModel.SupportVectors;
figure
gscatter(X(:,1),X(:,2),y)
hold on
plot(sv(:,1),sv(:,2),'ko','MarkerSize',10)
legend('versicolor','virginica','Support Vector')
hold off

The support vectors are observations that occur on or beyond their estimated class boundaries.

You can adjust the boundaries (and therefore the number of support vectors) by setting a box constraint during training using the 'BoxConstraint' name-value pair argument.

Train and Cross Validate Support Vector Machine Classifiers

Load the ionosphere data set.

load ionosphere

Train and cross validate an SVM classifier. It is good practice to standardize the predictors and specify the order of the classes.

rng(1);  % For reproducibility
CVSVMModel = fitcsvm(X,Y,'Standardize',true,...
    'ClassNames',{'b','g'},'CrossVal','on')
CVSVMModel = 

  classreg.learning.partition.ClassificationPartitionedModel
    CrossValidatedModel: 'SVM'
         PredictorNames: {1x34 cell}
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1x1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'


CVSVMModel is not a ClassificationSVM classifier, but a ClassificationPartitionedModel cross-validated, SVM classifier. By default, the software implements 10-fold cross validation.

Alternatively, you can cross validate a trained ClassificationSVM classifier by passing it to crossval.

Inspect one of the trained folds using dot notation.

CVSVMModel.Trained{1}
ans = 

  classreg.learning.classif.CompactClassificationSVM
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'b'  'g'}
           ScoreTransform: 'none'
                    Alpha: [78x1 double]
                     Bias: -0.2209
         KernelParameters: [1x1 struct]
                       Mu: [1x34 double]
                    Sigma: [1x34 double]
           SupportVectors: [78x34 double]
      SupportVectorLabels: [78x1 double]


Each fold is a CompactClassificationSVM classifier trained on 90% of the data.

Estimate the generalization error.

genError = kfoldLoss(CVSVMModel)
genError =

    0.1168

On average, the generalization error is approximately 12%.

Related Examples

Algorithms

  • NaN, <undefined>, and empty strings ('') indicate missing values. fitcsvm removes entire rows of data corresponding to a missing response. When computing total weights (see the bullets below), fitcsvm ignores any missing predictor observation. This can lead to unbalanced prior probabilities in balanced-class problems. Consequently, observation box constraints might not equal BoxConstraint.

  • fitcsvm removes observations that have zero weight or prior probability.

  • For two-class learning, if you specify the cost matrix C (see Cost), then the software updates the class prior probabilities p (see Prior) to pc by incorporating the penalties described in C.

    Specifically, fitcsvm:

    1. Computes pc=pC.

    2. Normalizes pc* so that the updated prior probabilities sum 1:

      pc=1j=1Kpc,jpc.

      K is the number of classes.

    3. Resets the cost matrix to the default:

      C=[0110].

    4. Removes observations from the training data corresponding to classes with zero prior probability.

  • For two-class learning, fitcsvm normalizes all observation weights (see Weights) to sum to 1. Then, renormalizes the normalized weights to sum up to the updated, prior probability of the class to which the observation belongs. That is, the total weight for observation j in class k is

    wj=wjjClass kwjpc,k.

    wj is the normalized weight for observation j; pc,k is the updated prior probability of class k (see previous bullet).

  • For two-class learning, fitcsvm assigns a box constraint to each observation in the training data. The formula for the box constraint of observation j is

    Cj=nC0wj.

    n is the training sample size, C0 is the initial box constraint (see BoxConstraint), and wj is the total weight of observation j (see previous bullet).

  • If you set 'Standardize',true and any of 'Cost', 'Prior', or 'Weights', then fitcsvm standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is, fitcsvm standardizes predictor j (xj) using

    xj=xjμjσj.

    where

    • μj=1kwkkwkxjk,

    • xjk is observation k (row) of predictor j (column).

    • (σj)2=v1v12v2kwk(xjkμj)2,

    • v1=jwj.

    • v2=j(wj)2.

  • Let op be the proportion of outliers you expect in the training data. If you use 'OutlierFraction',op when you train the SVM classifier using fitcsvm, then:

    • For one-class learning, the software trains the bias term such that 100op% of the observations in the training data have negative scores.

    • The software implements robust learning for two-class learning. In other words, the software attempts to remove 100op% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

  • If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.

    • The PredictorNames property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then PredictorNames is a 1-by-3 cell array of strings containing the original names of the predictor variables.

    • The ExpandedPredictorNames property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then ExpandedPredictorNames is a 1-by-5 cell array of strings containing the names of the predictor variables and the new dummy variables.

    • Similarly, the Beta property stores one beta coefficient for each predictor, including the dummy variables.

    • The SupportVectors property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. Then SupportVectors is an n-by-5 matrix.

    • The X property stores the training data as originally input. It does not include the dummy variables. When the input is a table, X contains only the columns used as predictors.

  • For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.

    • For a variable having k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is -1 for levels up to j, and +1 for levels j + 1 through k.

    • The names of the dummy variables stored in the ExpandedPredictorNames property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k.

  • All solvers implement L1 soft-margin minimization.

  • fitcsvm and svmtrain use, among other algorithms, SMO for optimization. The software implements SMO differently between the two functions, but numerical studies show that there is sensible agreement in the results.

  • For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that

    j=1nαj=nν.

References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. "Estimating the Support of a High-Dimensional Distribution." Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.

[3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

[4] Scholkopf, B. and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning Cambridge, MA: The MIT Press, 2002.

Was this topic helpful?