MathWorks Machine Translation
The automated translation of this page is provided by a general purpose third party translator tool.
MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.
Train binary support vector machine classifier
fitcsvm
trains or crossvalidates
a support vector machine (SVM) model for twoclass (binary) classification
on a low through moderatedimensional predictor data set. fitcsvm
supports
mapping the predictor data using kernel functions, and supports SMO,
ISDA, or L1 softmargin minimization via quadratic
programming for objectivefunction minimization.
To train a linear SVM model for binary classification on a highdimensional
data set, that is, data sets that include many predictor variables,
use fitclinear
instead.
For multiclass learning by combining binary SVM models, use
errorcorrecting output codes (ECOC). For more details, see fitcecoc
.
To train an SVM regression model, see fitrsvm
for
low through moderatedimensional predictor data sets, or fitrlinear
for highdimensional data sets.
returns
a support
vector machine classifier Mdl
= fitcsvm(Tbl
,ResponseVarName
)Mdl
trained using
the sample data contained in a table (Tbl
). ResponseVarName
is
the name of the variable in Tbl
that contains the
class labels for one or twoclass classification.
returns
a support vector machine classifier with additional options specified
by one or more Mdl
= fitcsvm(___,Name,Value
)Name,Value
pair arguments, using
any of the previous syntaxes. For example, you can specify the type
of crossvalidation, the cost for misclassification, or the type of
score transformation function.
Load Fisher's iris data set. Remove the sepal lengths and widths, and all observed setosa irises.
load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds);
Train an SVM classifier using the processed data set.
SVMModel = fitcsvm(X,y)
SVMModel = ClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 100 Alpha: [24×1 double] Bias: 14.4149 KernelParameters: [1×1 struct] BoxConstraints: [100×1 double] ConvergenceInfo: [1×1 struct] IsSupportVector: [100×1 logical] Solver: 'SMO'
The Command Window shows that SVMModel
is a trained ClassificationSVM
classifier and a property list. Display the properties of SVMModel
, for example, to determine the class order, by using dot notation.
classOrder = SVMModel.ClassNames
classOrder = 2×1 cell array 'versicolor' 'virginica'
The first class ('versicolor'
) is the negative class, and the second ('virginica'
) is the positive class. You can change the class order during training by using the 'ClassNames'
namevalue pair argument.
Plot a scatter diagram of the data and circle the support vectors.
sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off
The support vectors are observations that occur on or beyond their estimated class boundaries.
You can adjust the boundaries (and therefore the number of support vectors) by setting a box constraint during training using the 'BoxConstraint'
namevalue pair argument.
Load the ionosphere
data set.
load ionosphere rng(1); % For reproducibility
Train an SVM classifier using the radial basis kernel. Let the software find a scale value for the kernel function. It is good practice to standardize the predictors.
SVMModel = fitcsvm(X,Y,'Standardize',true,'KernelFunction','RBF',... 'KernelScale','auto');
SVMModel
is a trained ClassificationSVM
classifier.
Cross validate the SVM classifier. By default, the software uses 10fold cross validation.
CVSVMModel = crossval(SVMModel);
CVSVMModel
is a ClassificationPartitionedModel
crossvalidated classifier.
Estimate the outofsample misclassification rate.
classLoss = kfoldLoss(CVSVMModel)
classLoss = 0.0484
The generalization rate is approximately 5%.
Load Fisher's iris data set. Remove the petal lengths and widths. Treat all irises as coming from the same class.
load fisheriris
X = meas(:,1:2);
y = ones(size(X,1),1);
Train an SVM classifier using the processed data set. Assume that 5% of the observations are outliers. It is good practice to standardize the predictors.
rng(1); SVMModel = fitcsvm(X,y,'KernelScale','auto','Standardize',true,... 'OutlierFraction',0.05);
SVMModel
is a trained ClassificationSVM
classifier. By default, the software uses the Gaussian kernel for oneclass learning.
Plot the observations and the decision boundary. Flag the support vectors and potential outliers.
svInd = SVMModel.IsSupportVector; h = 0.02; % Mesh grid step size [X1,X2] = meshgrid(min(X(:,1)):h:max(X(:,1)),... min(X(:,2)):h:max(X(:,2))); [~,score] = predict(SVMModel,[X1(:),X2(:)]); scoreGrid = reshape(score,size(X1,1),size(X2,2)); figure plot(X(:,1),X(:,2),'k.') hold on plot(X(svInd,1),X(svInd,2),'ro','MarkerSize',10) contour(X1,X2,scoreGrid) colorbar; title('{\bf Iris Outlier Detection via OneClass SVM}') xlabel('Sepal Length (cm)') ylabel('Sepal Width (cm)') legend('Observation','Support Vector') hold off
The boundary separating the outliers from the rest of the data occurs where the contour value is 0
.
Verify that the fraction of observations with negative scores in the crossvalidated data is close to 5%.
CVSVMModel = crossval(SVMModel); [~,scorePred] = kfoldPredict(CVSVMModel); outlierRate = mean(scorePred<0)
outlierRate = 0.0467
Load Fisher's iris data set. Use the petal lengths and widths.
load fisheriris
X = meas(:,3:4);
Y = species;
Examine a scatter plot of the data.
figure gscatter(X(:,1),X(:,2),Y); h = gca; lims = [h.XLim h.YLim]; % Extract the x and y axis limits title('{\bf Scatter Diagram of Iris Measurements}'); xlabel('Petal Length (cm)'); ylabel('Petal Width (cm)'); legend('Location','Northwest');
There are three classes, one of which is linearly separable from the others.
For each class:
Create a logical vector (indx
) indicating whether an observation is a member of the class.
Train an SVM classifier using the predictor data and indx
.
Store the classifier in a cell of a cell array.
It is good practice to define the class order.
SVMModels = cell(3,1); classes = unique(Y); rng(1); % For reproducibility for j = 1:numel(classes); indx = strcmp(Y,classes(j)); % Create binary classes for each classifier SVMModels{j} = fitcsvm(X,indx,'ClassNames',[false true],'Standardize',true,... 'KernelFunction','rbf','BoxConstraint',1); end
SVMModels
is a 3by1 cell array, with each cell containing a ClassificationSVM
classifier. For each cell, the positive class is setosa, versicolor, and virginica, respectively.
Define a fine grid within the plot, and treat the coordinates as new observations from the distribution of the training data. Estimate the score of the new observations using each classifier.
d = 0.02; [x1Grid,x2Grid] = meshgrid(min(X(:,1)):d:max(X(:,1)),... min(X(:,2)):d:max(X(:,2))); xGrid = [x1Grid(:),x2Grid(:)]; N = size(xGrid,1); Scores = zeros(N,numel(classes)); for j = 1:numel(classes); [~,score] = predict(SVMModels{j},xGrid); Scores(:,j) = score(:,2); % Second column contains positiveclass scores end
Each row of Scores
contains three scores. The index of the element with the largest score is the index of the class to which the new class observation most likely belongs.
Associate each new observation with the classifier that gives it the maximum score.
[~,maxScore] = max(Scores,[],2);
Color in the regions of the plot based on which class the corresponding new observation belongs.
figure h(1:3) = gscatter(xGrid(:,1),xGrid(:,2),maxScore,... [0.1 0.5 0.5; 0.5 0.1 0.5; 0.5 0.5 0.1]); hold on h(4:6) = gscatter(X(:,1),X(:,2),Y); title('{\bf Iris Classification Regions}'); xlabel('Petal Length (cm)'); ylabel('Petal Width (cm)'); legend(h,{'setosa region','versicolor region','virginica region',... 'observed setosa','observed versicolor','observed virginica'},... 'Location','Northwest'); axis tight hold off
This example shows how to optimize hyperparameters automatically using fitcsvm
. The example uses the ionosphere
data.
Load the data.
load ionosphere
Find hyperparameters that minimize fivefold crossvalidation loss by using automatic hyperparameter optimization.
For reproducibility, set the random seed and use the 'expectedimprovementplus'
acquisition function.
rng default Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',... 'expectedimprovementplus'))
=================================================================================================  Iter  Eval  Objective  Objective  BestSoFar  BestSoFar  BoxConstrain KernelScale    result   runtime  (observed)  (estim.)  t   =================================================================================================  1  Best  0.12821  147.06  0.12821  0.12821  0.42371  0.006703   2  Accept  0.13675  1.1557  0.12821  0.12876  0.71122  4.9578   3  Accept  0.16809  159.18  0.12821  0.12998  462.63  0.018515   4  Accept  0.35897  0.69293  0.12821  0.12824  0.0016958  247.22   5  Accept  0.16809  148.84  0.12821  0.12826  27.342  0.011315   6  Accept  0.35897  0.90372  0.12821  0.13632  0.14834  121.74   7  Accept  0.23362  142.75  0.12821  0.14021  30.658  0.001022   8  Accept  0.1396  3.2571  0.12821  0.14247  49.065  1.581   9  Accept  0.1339  0.83558  0.12821  0.13549  0.0085176  0.28903   10  Best  0.12821  1.0662  0.12821  0.12931  0.0010271  0.015302   11  Accept  0.35897  0.66735  0.12821  0.13597  0.0010864  1.6148   12  Accept  0.1339  3.5592  0.12821  0.12442  0.52736  0.12943   13  Best  0.12536  8.8503  0.12536  0.12052  0.068293  0.029608   14  Accept  0.1339  3.6091  0.12536  0.12535  3.9987  0.3546   15  Best  0.11966  0.56534  0.11966  0.11965  0.0058807  0.08002   16  Accept  0.11966  81.168  0.11966  0.11965  0.0057969  0.0016972   17  Accept  0.12821  0.5936  0.11966  0.11964  0.59862  1.0498   18  Accept  0.12251  40.358  0.11966  0.11964  0.015919  0.0055785   19  Accept  0.1339  0.71137  0.11966  0.12051  0.036384  0.16065   20  Accept  0.12821  0.61387  0.11966  0.12032  4.0737  2.5745  =================================================================================================  Iter  Eval  Objective  Objective  BestSoFar  BestSoFar  BoxConstrain KernelScale    result   runtime  (observed)  (estim.)  t   =================================================================================================  21  Accept  0.12536  84.685  0.11966  0.12067  0.0010889  0.0010109   22  Accept  0.13105  165.7  0.11966  0.12037  987.01  0.21248   23  Accept  0.12536  65.03  0.11966  0.12009  0.0010574  0.0033974   24  Accept  0.1396  1.3243  0.11966  0.12002  969.03  12.066   25  Accept  0.12251  25.978  0.11966  0.11983  995.56  2.1147   26  Accept  0.1567  206.51  0.11966  0.12009  0.039285  0.0010501   27  Accept  0.13105  3.4498  0.11966  0.12081  0.0071342  0.023989   28  Accept  0.12251  103.95  0.11966  0.12011  0.72608  0.023072   29  Accept  0.12536  1.127  0.11966  0.11968  0.0010318  0.064622   30  Accept  0.35897  1.138  0.11966  0.12037  936.65  948.48  __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 1705.2912 seconds. Total objective function evaluation time: 1405.3256 Best observed feasible point: BoxConstraint KernelScale _____________ ___________ 0.0058807 0.08002 Observed objective function value = 0.11966 Estimated objective function value = 0.12037 Function evaluation time = 0.56534 Best estimated feasible point (according to models): BoxConstraint KernelScale _____________ ___________ 0.0058807 0.08002 Estimated objective function value = 0.12037 Estimated function evaluation time = 0.89038 Mdl = ClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 351 HyperparameterOptimizationResults: [1×1 BayesianOptimization] Alpha: [105×1 double] Bias: 3.7681 KernelParameters: [1×1 struct] BoxConstraints: [351×1 double] ConvergenceInfo: [1×1 struct] IsSupportVector: [351×1 logical] Solver: 'SMO'
Tbl
— Sample dataSample data used to train the model, specified as a table. Each
row of Tbl
corresponds to one observation, and
each column corresponds to one predictor variable. Optionally, Tbl
can
contain one additional column for the response variable. Multicolumn
variables and cell arrays other than cell arrays of character vectors
are not allowed.
If Tbl
contains the response variable, and
you want to use all remaining variables in Tbl
as
predictors, then specify the response variable using ResponseVarName
.
If Tbl
contains the response variable, and
you want to use only a subset of the remaining variables in Tbl
as
predictors, then specify a formula using formula
.
If Tbl
does not contain the response variable,
then specify a response variable using Y
. The
length of response variable and the number of rows of Tbl
must
be equal.
Data Types: table
ResponseVarName
— Response variable nameTbl
Response variable name, specified as the name of a variable
in Tbl
.
You must specify ResponseVarName
as a character
vector. For example, if the response variable Y
is
stored as Tbl.Y
, then specify it as 'Y'
.
Otherwise, the software treats all columns of Tbl
,
including Y
, as predictors when training the model.
The response variable must be a categorical or character array,
logical or numeric vector, or cell array of character vectors. If Y
is
a character array, then each element must correspond to one row of
the array.
It is good practice to specify the order of the classes using
the ClassNames
namevalue pair argument.
Data Types: char
formula
— Explanatory model of response and subset of predictor variablesExplanatory model of the response and a subset of the predictor
variables, specified as a character vector in the form of 'Y~X1+X2+X3'
.
In this form, Y
represents the response variable,
and X1
, X2
, and X3
represent
the predictor variables. The variables must be variable names in Tbl
(Tbl.Properties.VariableNames
).
To specify a subset of variables in Tbl
as
predictors for training the model, use a formula. If you specify a
formula, then the software does not use any variables in Tbl
that
do not appear in formula
.
Data Types: char
Y
— Class labelsClass labels to which the SVM model is trained, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.
Y
must contain at most two distinct
classes. For multiclass learning, see fitcecoc
.
If Y
is a character array, then
each element must correspond to one row of the array.
The length of Y
and the number
of rows of Tbl
or X
must be
equal.
It is good practice to specify the class order using
the ClassNames
namevalue pair argument.
Data Types: char
 cell
 categorical
 logical
 single
 double
X
— Predictor dataPredictor data to which the SVM classifier is trained, specified as a matrix of numeric values.
Each row of X
corresponds to one observation
(also known as an instance or example), and each column corresponds
to one predictor.
The length of Y
and the number of rows of X
must
be equal.
To specify the names of the predictors in the order of their
appearance in X
, use the PredictorNames
namevalue
pair argument.
Data Types: double
 single
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'KFold',10,'Cost',[0 2;1 0],'ScoreTransform','sign'
specifies
to perform 10fold crossvalidation, apply double the penalty to false
positives compared to false negatives, and transform the scores using
the sign function.Note:
You cannot use any crossvalidation namevalue pair along with 
'BoxConstraint'
— Box constraintBox constraint,
specified as the commaseparated pair consisting of 'BoxConstraint'
and
a positive scalar.
For oneclass learning, the software always sets the box constraint
to 1
.
For more details on the relationships and algorithmic behavior
of BoxConstraint
, Cost
, Prior
, Standardize
,
and Weights
, see Algorithms.
Example: 'BoxConstraint',100
Data Types: double
 single
'KernelFunction'
— Kernel function'linear'
(default)  'gaussian'
 'rbf'
 'polynomial'
 function nameKernel function used to compute the Gram matrix, specified as the commaseparated
pair consisting of 'KernelFunction'
and a value
in this table.
Value  Description  Formula 

'gaussian' or 'rbf'  Gaussian or Radial Basis Function (RBF) kernel, default for oneclass learning  $$G\left({x}_{1},{x}_{2}\right)=\mathrm{exp}\left({\Vert {x}_{1}{x}_{2}\Vert}^{2}\right)$$ 
'linear'  Linear kernel, default for twoclass learning  $$G({x}_{1},{x}_{2})={x}_{1}\prime {x}_{2}$$ 
'polynomial'  Polynomial kernel. Use 'PolynomialOrder', to
specify a polynomial kernel of order p .  $$G({x}_{1},{x}_{2})={(1+{x}_{1}\prime {x}_{2})}^{p}$$ 
You can set your own kernel
function, for example, kernel
, by setting 'KernelFunction','kernel'
. kernel
must
have the following form:
function G = kernel(U,V)
U
is an mbyp matrix.
V
is an nbyp matrix.
G
is an mbyn Gram
matrix of the rows of U
and V
.
And kernel.m
must be on the MATLAB^{®} path.
It is good practice to avoid using generic names for kernel
functions. For example, call a sigmoid kernel function 'mysigmoid'
rather
than 'sigmoid'
.
Example: 'KernelFunction','gaussian'
Data Types: char
'KernelScale'
— Kernel scale parameter1
(default)  'auto'
 positive scalarKernel scale parameter, specified as the commaseparated pair
consisting of 'KernelScale'
and 'auto'
or
a positive scalar. The software divides all elements of the predictor
matrix X
by the value of KernelScale
.
Then, the software applies the appropriate kernel norm to compute
the Gram matrix.
If you specify 'auto'
, then the
software selects an appropriate scale factor using a heuristic procedure.
This heuristic procedure uses subsampling, so estimates can vary from
one call to another. Therefore, to reproduce results, set a random
number seed using rng
before
training.
If you specify KernelScale
and
your own kernel function, for example, kernel
,
using 'KernelFunction','kernel'
, then the software
throws an error. You must apply scaling within kernel
.
Example: 'KernelScale',''auto'
Data Types: double
 single
 char
'PolynomialOrder'
— Polynomial kernel function order3
(default)  positive integerPolynomial kernel function order, specified as the commaseparated
pair consisting of 'PolynomialOrder'
and a positive
integer.
If you set 'PolynomialOrder'
and KernelFunction
is
not 'polynomial'
, then the software throws an error.
Example: 'PolynomialOrder',2
Data Types: double
 single
'KernelOffset'
— Kernel offset parameterKernel offset parameter, specified as the commaseparated pair
consisting of 'KernelOffset'
and a nonnegative
scalar.
The software adds KernelOffset
to each element
of the Gram matrix.
The defaults are:
0
if the solver is SMO (that is,
you set 'Solver','SMO'
)
0.1
if the solver is ISDA (that
is, you set 'Solver','ISDA'
)
Example: 'KernelOffset',0
Data Types: double
 single
'Standardize'
— Flag to standardize predictor datafalse
(default)  true
Flag to standardize the predictor data, specified as the commaseparated
pair consisting of 'Standardize'
and true
(1
)
or false
(0)
.
If you set 'Standardize',true
:
The software centers and scales each column of the
predictor data (X
) by the weighted column mean
and standard deviation, respectively (for details on weighted standardizing,
see Algorithms). MATLAB does
not standardize the data contained in the dummy variable columns generated
for categorical predictors.
The software trains the classifier using the standardized
predictor matrix, but stores the unstandardized data in the classifier
property X
.
Example: 'Standardize',true
Data Types: logical
'Solver'
— Optimization routine'ISDA'
 'L1QP'
 'SMO'
Optimization routine, specified as the commaseparated pair
consisting of 'Solver'
and a value in this table.
Value  Description 

'ISDA'  Iterative Single Data Algorithm (see [30]) 
'L1QP'  Uses quadprog to implement L1
softmargin minimization by quadratic programming. This option requires
an Optimization Toolbox™ license. For more details, see Quadratic Programming Definition. 
'SMO'  Sequential Minimal Optimization (see [17]) 
The defaults are:
'ISDA'
if you set 'OutlierFraction'
to
a positive value and for twoclass learning
'SMO'
otherwise
Example: 'Solver','ISDA'
Data Types: char
'Alpha'
— Initial estimates of alpha coefficientsInitial estimates of alpha coefficients, specified as the commaseparated
pair consisting of 'Alpha'
and a numeric vector
of nonnegative values. The length of Alpha
must
be equal to the number of rows of X
.
Each element of Alpha
corresponds
to an observation in X
.
Alpha
cannot contain any NaN
s.
If you specify Alpha
and any one
of the crossvalidations namevalue pair arguments ('CrossVal'
, 'CVPartition'
, 'Holdout'
, 'KFold'
,
or 'Leaveout'
), then the software returns an error.
If Y
contains any missing values, then
remove all rows of Y
, X
, and Alpha
that
correspond to the missing values. That is, enter:
idx = ~isundefined(categorical(Y)); Y = Y(idx,:); X = X(idx,:); alpha = alpha(idx);
Y
, X
,
and alpha
as the response, predictors, and initial
alpha estimates, respectively.
The defaults are:
0.5*ones(size(X,1),1)
for oneclass
learning
zeros(size(X,1),1)
for twoclass
learning
Example: 'Alpha',0.1*ones(size(X,1),1)
Data Types: double
 single
'CacheSize'
— Cache size1000
(default)  'maximal'
 positive scalarCache size, specified as the commaseparated pair consisting
of 'CacheSize'
and 'maximal'
or
a positive scalar.
If CacheSize
is 'maximal'
,
then the software reserves enough memory to hold the entire nbyn Gram matrix.
If CacheSize
is a positive scalar, then the
software reserves CacheSize
megabytes of memory
for training the classifier.
Example: 'CacheSize','maximal'
Data Types: double
 char
 single
'ClipAlphas'
— Flag to clip alpha coefficientstrue
(default)  false
Flag to clip alpha coefficients, specified as the commaseparated
pair consisting of 'ClipAlphas'
and either true
or false
.
Suppose that the alpha coefficient for observation j is α_{j} and the box constraint of observation j is C_{j}, j = 1,...,n. n is the training sample size.
Value  Description 

true  At each iteration, if α_{j} is near 0 or near C_{j}, then MATLAB sets α_{j} to 0 or to C_{j}, respectively. 
false  MATLAB does not change the alpha coefficients during optimization. 
MATLAB stores the final values of α in
the Alpha
property of the trained SVM model object.
ClipAlphas
can affect SMO and ISDA convergence.
Example: 'ClipAlphas',false
Data Types: logical
'Nu'
— ν parameter for oneclass learning0.5
(default)  positive scalarν parameter for oneclass learning, specified as
the commaseparated pair consisting of 'Nu'
and
a positive scalar. Nu
must be greater than 0
and
at most 1
.
Set Nu
to control the tradeoff between ensuring
most training examples are in the positive class and minimizing the
weights in the score function.
Example: 'Nu',0.25
Data Types: double
 single
'NumPrint'
— Number of iterations between optimization diagnostic message output1000
(default)  nonnegative integerNumber of iterations between optimization diagnostic message
output, specified as the commaseparated pair consisting of 'NumPrint'
and
a nonnegative integer.
If you use 'Verbose',1
and 'NumPrint',numprint
,
then the software displays all optimization diagnostic messages from
SMO and ISDA every numprint
iterations in the Command
Window.
Example: 'NumPrint',500
Data Types: double
 single
'OutlierFraction'
— Expected proportion of outliers in training data0
(default)  numeric scalar in the interval [0,1)Expected proportion of outliers in the training data, specified
as the commaseparated pair consisting of 'OutlierFraction'
and
a numeric scalar in the interval [0,1).
If you set 'OutlierFraction',outlierfraction
,
where outlierfraction
is a value greater than 0,
then:
For twoclass learning, the software implements robust
learning. In other words, the software attempts to remove
100*outlierfraction
% of the observations when the
optimization algorithm converges. The removed observations correspond
to gradients that are large in magnitude.
For oneclass learning, the software finds an appropriate
bias term such that outlierfraction
of the observations
in the training set have negative scores.
Example: 'OutlierFraction',0.01
Data Types: double
 single
'RemoveDuplicates'
— Flag to replace duplicate observations with single observations in training datafalse
(default)  true
Flag to replace duplicate observations with single observations
in the training data, specified as the commaseparated pair consisting
of 'RemoveDuplicates'
and true
or false
.
If RemoveDuplicates
is true
,
then fitcsvm
replaces duplicate observations
in the training data with a single observation of the same value.
The weight of the single observation is equal to the sum of the weights
of the corresponding removed duplicates (see Weights
).
Tip
If your data set contains many duplicate observations, then
specifying 
Data Types: logical
'Verbose'
— Verbosity level0
(default)  1
 2
Verbosity level, specified as the commaseparated pair consisting
of 'Verbose'
and either 0
, 1
,
or 2
. Verbose
controls the amount
of optimization information that the software displays in the Command
Window and saves as a structure to Mdl.ConvergenceInfo.History
.
This table summarizes the available verbosity level options.
Value  Description 

0  The software does not display or save convergence information. 
1  The software displays diagnostic messages and saves convergence
criteria every numprint iterations, where numprint is
the value of the namevalue pair argument 'NumPrint' . 
2  The software displays diagnostic messages and saves convergence criteria at every iteration. 
Example: 'Verbose',1
Data Types: double
 single
'CategoricalPredictors'
— Categorical predictors list'all'
List of categorical predictors, specified as the commaseparated
pair consisting of 'CategoricalPredictors'
and
one of the following:
A numeric vector with indices from 1
through p
,
where p
is the number of columns of X
.
A logical vector of length p
, where
a true
entry means that the corresponding column
of X
is a categorical variable.
A cell array of character vectors, where each element
in the array is the name of a predictor variable. The names must match
entries in PredictorNames
values.
'all'
, meaning all predictors are
categorical.
By default, if the predictor data is in a matrix (X
), fitcsvm
assumes
that none of the predictors are categorical. If the predictor data
is in a table (Tbl
), fitcsvm
assumes
that a variable is categorical if it contains logical values, categorical
values, or a cell array of character vectors.
For example, the following syntax specifies that columns 1 and
3 of the input matrix X
contain categorical variables.
Example: 'CategoricalPredictors',[1,3]
Data Types: single
 double
 logical
 cell
'ClassNames'
— Names of classes to use for trainingNames of classes to use for training, specified as the commaseparated
pair consisting of 'ClassNames'
and a categorical
or character array, logical or numeric vector, or cell array of character
vectors. ClassNames
must be the same data type
as Y
.
If ClassNames
is a character array, then
each element must correspond to one row of the
array.
Use ClassNames
to:
Order the classes during training.
Specify the order of any input or output argument
dimension that corresponds to the class order. For example, use ClassNames
to
specify the order of the dimensions of Cost
or
the column order of classification scores returned by predict
.
Select a subset of classes for training. For example,
suppose that the set of all distinct class names in Y
is {'a','b','c'}
.
To train the model using observations from classes 'a'
and 'c'
only,
specify 'ClassNames',{'a','c'}
.
The default is the set of all distinct class names in Y
.
Example: 'ClassNames',{'b','g'}
Data Types: categorical
 char
 logical
 single
 double
 cell
'Cost'
— Misclassification costMisclassification cost, specified as the commaseparated pair
consisting of 'Cost'
and a square matrix or structure.
If you specify:
The square matrix Cost
, then, if
true class of an observation is i
, Cost(i,j)
is
the cost of classifying a point into class j
. That
is, rows correspond to the true classes and the columns correspond
to predicted classes. To specify the class order for the corresponding
rows and columns of Cost
, also specify the ClassNames
namevalue
pair argument.
The structure S
, then it must have
two fields:
S.ClassNames
, which contains the
class names as a variable of the same data type as Y
S.ClassificationCosts
, which contains
the cost matrix with rows and columns ordered as in S.ClassNames
For twoclass learning, if you specify a cost matrix, then the
software updates the prior probabilities by incorporating the penalties
described in the cost matrix. Consequently, the cost matrix resets
to the default. For more details on the relationships and algorithmic
behavior of BoxConstraint
, Cost
, Prior
, Standardize
,
and Weights
, see Algorithms.
The defaults are:
For oneclass learning, Cost = 0
.
For twoclass learning, Cost(i,j) = 1
if i
~= j
, and Cost(i,j) = 0
if i
= j
.
Example: 'Cost',[0,1;2,0]
Data Types: double
 single
 struct
'PredictorNames'
— Predictor variable namesPredictor variable names, specified as the commaseparated pair
consisting of 'PredictorNames'
and a cell array
of unique character vectors. The functionality of 'PredictorNames'
depends
on the way you supply the training data.
If you supply X
and Y
,
then you can use 'PredictorNames'
to give the predictor
variables in X
names.
The order of the names in PredcitorNames
must
correspond to the column order of X
. That is, PredictorNames{1}
is
the name of X(:,1)
, PredictorNames{2}
is
the name of X(:,2)
, and so on. Also, size(X,2)
and numel(PredictorNames)
must
be equal.
By default, PredictorNames
is {x1,x2,...}
.
If you supply Tbl
, then you can
use 'PredictorNames'
to choose which predictor
variables to use in training. That is, fitcsvm
uses
the predictor variables in PredictorNames
and the
response only in training.
PredictorNames
must be a subset
of Tbl.Properties.VariableNames
and cannot include
the name of the response variable.
By default, PredictorNames
contains
the names of all predictor variables.
It good practice to specify the predictors for training
using one of 'PredictorNames'
or formula
only.
Example: 'PredictorNames',{'SepalLength','SepalWidth','PedalLength','PedalWidth'}
Data Types: cell
'Prior'
— Prior probabilities'empirical'
(default)  'uniform'
 numeric vector  structure arrayPrior probabilities for each class, specified as the commaseparated
pair consisting of 'Prior'
and a value in this
table.
Value  Description 

'empirical'  The class prior probabilities are the class relative frequencies
in Y . 
'uniform'  All class prior probabilities are equal to 1/K, where K is the number of classes. 
numeric vector  Each element is a class prior probability. Order the elements
according to Mdl.ClassNames or specify the order
using the ClassNames namevalue pair argument.
The software normalizes the elements such that they sum to 1 . 
structure  A structure

For twoclass learning, if you specify a cost matrix, then the
software updates the prior probabilities by incorporating the penalties
described in the cost matrix. For more details on the relationships
and algorithmic behavior of BoxConstraint
, Cost
, Prior
, Standardize
,
and Weights
, see Algorithms.
Example: struct('ClassNames',{{'setosa','versicolor','virginica'}},'ClassProbs',1:3)
Data Types: char
 double
 single
 struct
'ResponseName'
— Response variable name'Y'
(default)  character vectorResponse variable name, specified as the commaseparated pair
consisting of 'ResponseName'
and a character vector.
If you supply Y
, then you can
use 'ResponseName'
to specify a name for the response
variable.
If you supply ResponseVarName
or formula
,
then you cannot use 'ResponseName'
.
Example: 'ResponseName','response'
Data Types: char
'ScoreTransform'
— Score transform function'none'
(default)  'doublelogit'
 'invlogit'
 'ismax'
 'logit'
 'sign'
 'symmetric'
 'symmetriclogit'
 'symmetricismax'
 function handleScore transform function, specified as the commaseparated pair
consisting of 'ScoreTransform'
and a character
vector or function handle.
This table summarizes the available builtin functions.
Value  Formula 

'doublelogit'  1/(1 + e^{–2x}) 
'invlogit'  log(x / (1–x)) 
'ismax'  Set the score for the class with the largest score to 1 ,
and scores for all other classes to 0 . 
'logit'  1/(1 + e^{–x}) 
'none' or 'identity'  x (no transformation) 
'sign'  –1 for x < 0 0 for x = 0 1 for x > 0 
'symmetric'  2x – 1 
'symmetriclogit'  2/(1 + e^{–x}) – 1 
'symmetricismax'  Set the score for the class with the largest score to 1 ,
and scores for all other classes to 1 . 
For a MATLAB function, or a function that you define, enter its function handle.
Mdl.ScoreTransform = @function;
function
must accept a matrix (the original
scores) and return a matrix of the same size (the transformed scores).
Example: 'ScoreTransform','sign'
Data Types: char
 function_handle
'Weights'
— Observation weightsTbl
Observation weights, specified as the commaseparated pair consisting
of 'Weights'
and a numeric vector of positive values
or name of a variable in Tbl
. The software weighs
the observations in each row of X
or Tbl
with
the corresponding value in Weights
. The size of Weights
must
equal the number of rows of X
or Tbl
.
If you specify the input data as a table Tbl
,
then Weights
can be the name of a variable in Tbl
that
contains a numeric vector. In this case, you must specify Weights
as
a character vector. For example, if the weights vector W
is
stored as Tbl.W
, then specify it as 'W'
.
Otherwise, the software treats all columns of Tbl
,
including W
, as predictors or the response when
training the model.
By default, Weights
is ones(
,
where n
,1)n
is the number of observations in X
or Tbl
.
The software normalizes Weights
to sum up
to the value of the prior probability in the respective class. For
more details on the relationships and algorithmic behavior of BoxConstraint
, Cost
, Prior
, Standardize
,
and Weights
, see Algorithms.
Data Types: double
 single
'CrossVal'
— Flag to train crossvalidated classifier'off'
(default)  'on'
Flag to train a crossvalidated classifier, specified as the
commaseparated pair consisting of 'Crossval'
and 'on'
or 'off'
.
If you specify 'on'
, then the software trains
a crossvalidated classifier with 10 folds.
You can override this crossvalidation setting using one of
the CVPartition
, Holdout
, KFold
,
or Leaveout
namevalue pair arguments. You can
only use one crossvalidation namevalue pair argument at a time to
create a crossvalidated model.
Alternatively, crossvalidate later by passing Mdl
to crossval
.
Example: 'Crossval','on'
Data Types: char
'CVPartition'
— Crossvalidation partition[]
(default)  cvpartition
partition objectCrossvalidation partition, specified as the commaseparated
pair consisting of 'CVPartition'
and a cvpartition
partition
object as created by cvpartition
.
The partition object specifies the type of crossvalidation, and also
the indexing for training and validation sets.
To create a crossvalidated model, you can use one of these
four namevalue pair arguments only: CVPartition
, Holdout
, KFold
,
or Leaveout
.
'Holdout'
— Fraction of data for holdout validationFraction of data used for holdout validation, specified as the
commaseparated pair consisting of 'Holdout'
and
a scalar value in the range (0,1). If you specify 'Holdout',
,
then the software: p
Randomly reserves
%
of the data as validation data, and trains the model using the rest
of the datap
*100
Stores the compact, trained model in the Trained
property
of the crossvalidated model.
To create a crossvalidated model, you can use one of these
four namevalue pair arguments only: CVPartition
, Holdout
, KFold
,
or Leaveout
.
Example: 'Holdout',0.1
Data Types: double
 single
'KFold'
— Number of folds10
(default)  positive integer value greater than 1Number of folds to use in a crossvalidated classifier, specified
as the commaseparated pair consisting of 'KFold'
and
a positive integer value greater than 1. If you specify, e.g., 'KFold',k
,
then the software:
Randomly partitions the data into k sets
For each set, reserves the set as validation data, and trains the model using the other k – 1 sets
Stores the k
compact, trained
models in the cells of a k
by1 cell vector
in the Trained
property of the crossvalidated
model.
To create a crossvalidated model, you can use one of these
four namevalue pair arguments only: CVPartition
, Holdout
, KFold
,
or Leaveout
.
Example: 'KFold',5
Data Types: single
 double
'Leaveout'
— Leaveoneout crossvalidation flag'off'
(default)  'on'
Leaveoneout crossvalidation flag, specified as the commaseparated
pair consisting of 'Leaveout'
and 'on'
or 'off'
.
If you specify 'Leaveout','on'
, then, for each
of the n observations, where n is size(Mdl.X,1)
,
the software:
Reserves the observation as validation data, and trains the model using the other n – 1 observations
Stores the n compact, trained models
in the cells of an nby1 cell vector in the Trained
property
of the crossvalidated model.
To create a crossvalidated model, you can use one of these
four namevalue pair arguments only: CVPartition
, Holdout
, KFold
,
or Leaveout
.
Example: 'Leaveout','on'
Data Types: char
'DeltaGradientTolerance'
— Tolerance for gradient differenceTolerance for the gradient difference between upper and lower
violators obtained by Sequential Minimal Optimization (SMO) or Iterative
Single Data Algorithm (ISDA), specified as the commaseparated pair
consisting of 'DeltaGradientTolerance'
and a nonnegative
scalar.
If DeltaGradientTolerance
is 0
,
then the software does not use the tolerance for the gradient difference
to check for optimization convergence.
The defaults are:
1e3
if the solver is SMO (for
example, you set 'Solver','SMO'
)
0
if the solver is ISDA (for example,
you set 'Solver','ISDA'
)
Example: 'DeltaGapTolerance',1e2
Data Types: double
 single
'GapTolerance'
— Feasibility gap tolerance0
(default)  nonnegative scalarFeasibility gap tolerance obtained by SMO or ISDA, specified
as the commaseparated pair consisting of 'GapTolerance'
and
a nonnegative scalar.
If GapTolerance
is 0
,
then the software does not use the feasibility gap tolerance to check
for optimization convergence.
Example: 'GapTolerance',1e2
Data Types: double
 single
'IterationLimit'
— Maximal number of numerical optimization iterations1e6
(default)  positive integerMaximal number of numerical optimization iterations, specified
as the commaseparated pair consisting of 'IterationLimit'
and
a positive integer.
The software returns a trained model regardless of whether the
optimization routine successfully converges. Mdl.ConvergenceInfo
contains
convergence information.
Example: 'IterationLimit',1e8
Data Types: double
 single
'KKTTolerance'
— KarushKuhnTucker complementarity conditions violation toleranceKarushKuhnTucker
(KKT) complementarity conditions violation tolerance, specified
as the commaseparated pair consisting of 'KKTTolerance'
and
a nonnegative scalar.
If KKTTolerance
is 0
,
then the software does not use the KKT complementarity conditions
violation tolerance to check for optimization convergence.
The defaults are:
0
if the solver is SMO (for example,
you set 'Solver','SMO'
)
1e3
if the solver is ISDA (for
example, you set 'Solver','ISDA'
)
Example: 'KKTTolerance',1e2
Data Types: double
 single
'ShrinkagePeriod'
— Number of iterations between movement of observations from active to inactive set0
(default)  nonnegative integerNumber of iterations between the movement of observations from
the active to inactive set, specified as the commaseparated pair
consisting of 'ShrinkagePeriod'
and a nonnegative
integer.
If you set 'ShrinkagePeriod',0
, then the
software does not shrink the active set.
Example: 'ShrinkagePeriod',1000
Data Types: double
 single
'OptimizeHyperparameters'
— Parameters to optimize'none'
(default)  'auto'
 'all'
 cell array of eligible parameter names  vector of optimizableVariable
objectsParameters to optimize, specified as:
'none'
— Do not optimize.
'auto'
— Use {'BoxConstraint','KernelScale'}
'all'
— Optimize all eligible
parameters.
Cell array of eligible parameter names
Vector of optimizableVariable
objects,
typically the output of hyperparameters
The optimization attempts to minimize the crossvalidation loss
(error) for fitcsvm
by varying the parameters.
For information about crossvalidation loss (albeit in a different
context), see Classification Loss.
To control the crossvalidation type and other aspects of the optimization,
use the HyperparameterOptimizationOptions
namevalue
pair.
The eligible parameters for fitcsvm
are:
BoxConstraint
— fitcsvm
searches
among positive values, by default logscaled in the range [1e3,1e3]
.
KernelScale
— fitcsvm
searches
among positive values, by default logscaled in the range [1e3,1e3]
.
KernelFunction
— fitcsvm
searches
among 'gaussian'
, 'linear'
,
and 'polynomial'
.
PolynomialOrder
— fitcsvm
searches
among integers in the range [2,4]
.
Standardize
— fitcsvm
searches
among 'true'
and 'false'
.
Set nondefault parameters by passing a vector of optimizableVariable
objects
that have nondefault values. For example,
load fisheriris params = hyperparameters('fitcsvm',meas,species); params(1).Range = [1e4,1e6];
Pass params
as the value of OptimizeHyperparameters
.
By default, iterative display appears at the command line, and
plots appear according to the number of hyperparameters in the optimization.
For the optimization and plots, the objective function is log(1 + crossvalidation loss) for
regression, and the misclassification rate for classification. To
control the iterative display, set the HyperparameterOptimizationOptions
namevalue
pair, Verbose
field. To control the plots, set
the HyperparameterOptimizationOptions
namevalue
pair, ShowPlots
field.
For an example, see Optimize SVM Classifier.
Example: 'auto'
Data Types: char
 cell
'HyperparameterOptimizationOptions'
— Options for optimizationOptions for optimization, specified as a structure. Modifies
the effect of the OptimizeHyperparameters
namevalue
pair. All fields in the structure are optional.
Field Name  Values  Default 

Optimizer 
 'bayesopt' 
AcquisitionFunctionName 
bayesopt AcquisitionFunctionName namevalue
pair, or Acquisition Function Types.  'expectedimprovementpersecondplus' 
MaxObjectiveEvaluations  Maximum number of objective function evaluations.  30 for 'bayesopt' or 'randomsearch' ,
and the entire grid for 'gridsearch' 
NumGridDivisions  For 'gridsearch' , the number of values in
each dimension. Can be a vector of positive integers giving the number
of values for each dimension, or a scalar that applies to all dimensions.
Ignored for categorical variables.  10 
ShowPlots  Logical value indicating whether to show plots. If true ,
plots the best objective function value against iteration number.
If there are one or two optimization parameters, and if Optimizer is 'bayesopt' ,
then ShowPlots also plots a model of the objective
function against the parameters.  true 
SaveIntermediateResults  Logical value indicating whether to save results when Optimizer is 'bayesopt' .
If true , overwrites a workspace variable named 'BayesoptResults' at
each iteration. The variable is a BayesianOptimization object.  false 
Verbose  Display to the command line.
bayesopt Verbose namevalue
pair.  1 
Repartition  Logical value indicating whether to repartition the crossvalidation
at every iteration. If
 false 
Use no more than one of the following three field names.  
CVPartition  A cvpartition object, as created by cvpartition  Kfold = 5 
Holdout  A scalar in the range (0,1) representing
the holdout fraction.  
Kfold  An integer greater than 1. 
Example: struct('MaxObjectiveEvaluations',60)
Data Types: struct
Mdl
— Trained SVM classification modelClassificationSVM
model object  ClassificationPartitionedModel
crossvalidated model objectTrained SVM classification model, returned as a ClassificationSVM
model
object or ClassificationPartitionedModel
crossvalidated
modle object.
If you set any of the namevalue pair arguments KFold
, Holdout
, Leaveout
, CrossVal
,
or CVPartition
, then Mdl
is
a ClassificationPartitionedModel
crossvalidated
classifier. Otherwise, Mdl
is a ClassificationSVM
classifier.
To reference properties of Mdl
, use dot notation.
For example, enter Mdl.Alpha
in the Command Window
to display the trained Lagrange multipliers.
fitcsvm
trains SVM classifiers
for one or twoclass learning applications. To train SVM classifiers
using data with more than two classes, use fitcecoc
.
fitcsvm
supports low through
moderatedimensional data sets. For highdimensional data set, use fitclinear
instead.
A parameter that controls the maximum penalty imposed on marginviolating observations, and aids in preventing overfitting (regularization).
If you increase the box constraint, then the SVM classifier assigns fewer support vectors. However, increasing the box constraint can lead to longer training times.
The Gram matrix of a set of n vectors {x_{1},..,x_{n}; x_{j} ∊ R^{p}} is an nbyn matrix with element (j,k) defined as G(x_{j},x_{k}) = <ϕ(x_{j}),ϕ(x_{k})>, an inner product of the transformed predictors using the kernel function ϕ.
For nonlinear SVM, the algorithm forms a Gram matrix using the predictor matrix columns. The dual formalization replaces the inner product of the predictors with corresponding elements of the resulting Gram matrix (called the "kernel trick"). Subsequently, nonlinear SVM operates in the transformed predictor space to find a separating hyperplane.
KKT complementarity conditions are optimization constraints required for optimal nonlinear programming solutions.
In SVM, the KKT complementarity conditions are
$$\{\begin{array}{l}{\alpha}_{j}\left[{y}_{j}f\left({x}_{j}\right)1+{\xi}_{j}\right]=0\\ {\xi}_{j}\left(C{\alpha}_{j}\right)=0\end{array}$$
for all j = 1,...,n, where $$f\left({x}_{j}\right)=\varphi \left({x}_{j}\right)\prime \beta +b,$$ ϕ is a kernel function (see Gram matrix), and ξ_{j} is a slack variable. If the classes are perfectly separable, then ξ_{j} = 0 for all j = 1,...,n.
Oneclass learning, or unsupervised SVM, aims at separating data from the origin in the highdimensional, predictor space (not the original predictor space), and is an algorithm used for outlier detection.
The algorithm resembles that of SVM for binary classification. The objective is to minimize dual expression
$$0.5{\displaystyle \sum _{jk}{\alpha}_{j}}{\alpha}_{k}G({x}_{j},{x}_{k})$$
with respect to $${\alpha}_{1},\mathrm{...},{\alpha}_{n}$$, subject to
$$\sum {\alpha}_{j}}=n\nu $$
and $$0\le {\alpha}_{j}\le 1$$ for all j = 1,...,n. G(x_{j},x_{k}) is element (j,k) of the Gram matrix.
A small value of ν leads to fewer support vectors, and, therefore, a smooth, crude decision boundary. A large value of ν leads to more support vectors, and therefore, a curvy, flexible decision boundary. The optimal value of ν should be large enough to capture the data complexity and small enough to avoid overtraining. Also, 0 < ν ≤ 1.
For more details, see [5].
Support vectors are observations corresponding to strictly positive estimates of α_{1},...,α_{n}.
SVM classifiers that yield fewer support vectors for a given training set are more desirable.
The SVM binary classification algorithm searches for an optimal hyperplane that separates the data into two classes. For separable classes, the optimal hyperplane maximizes a margin (space that does not contain any observations) surrounding itself, which creates boundaries for the positive and negative classes. For inseparable classes, the objective is the same, but the algorithm imposes a penalty on the length of the margin for every observation that is on the wrong side of its class boundary.
The linear SVM score function is
$$f(x)=x\prime \beta +b,$$
where:
x is an observation (corresponding
to a row of X
).
The vector β contains the
coefficients that define an orthogonal vector to the hyperplane (corresponding
to Mdl.Beta
). For separable data, the optimal margin
length is $$2/\Vert \beta \Vert .$$
b is the bias term (corresponding
to Mdl.Bias
).
The root of f(x) for particular coefficients defines a hyperplane. For a particular hyperplane, f(z) is the distance from point z to the hyperplane.
The algorithm searches for the maximum margin length, while keeping observations in the positive (y = 1) and negative (y = –1) classes separate. Therefore:
For separable classes, the objective is to minimize $$\Vert \beta \Vert $$ with respect to the β and b subject to y_{j}f(x_{j}) ≥ 1, for all j = 1,..,n. This is the primal formalization for separable classes.
For inseparable classes, the algorithm uses slack variables (ξ_{j}) to penalize the objective function for observations that cross the margin boundary for their class. ξ_{j} = 0 for observations that do not cross the margin boundary for their class, otherwise ξ_{j} ≥ 0.
The objective is to minimize$$0.5{\Vert \beta \Vert}^{2}+C{\displaystyle \sum {\xi}_{j}}$$ with respect to the β, b, and ξ_{j} subject to $${y}_{j}f({x}_{j})\ge 1{\xi}_{j}$$ and $${\xi}_{j}\ge 0$$ for all j = 1,..,n, and for a positive scalar box constraint C. This is the primal formalization for inseparable classes.
The algorithm uses the Lagrange multipliers method to optimize
the objective. This introduces n coefficients α_{1},...,α_{n}
(corresponding to Mdl.Alpha
). The dual formalizations
for linear SVM are:
For separable classes, minimize
$$0.5{\displaystyle \sum _{j=1}^{n}{\displaystyle \sum}_{k=1}^{n}}{\alpha}_{j}{\alpha}_{k}{y}_{j}{y}_{k}{x}_{j}\prime {x}_{k}{\displaystyle \sum}_{j=1}^{n}{\alpha}_{j}$$
with respect to α_{1},...,α_{n}, subject to $$\sum {\alpha}_{j}}{y}_{j}=0$$, α_{j} ≥ 0 for all j = 1,...,n, and KarushKuhnTucker (KKT) complementarity conditions.
For inseparable classes, the objective is the same as for separable classes, except for the additional condition $$0\le {\alpha}_{j}\le C$$ for all j = 1,..,n.
The resulting score function is
$$\widehat{f}(x)={\displaystyle \sum _{j=1}^{n}{\widehat{\alpha}}_{j}}{y}_{j}x\prime {x}_{j}+\widehat{b}.$$
$$\widehat{b}$$ is the estimate of the bias and $${\widehat{\alpha}}_{j}$$ is the jth estimate of the vector $$\widehat{\alpha}$$, j = 1,...,n. Written this way, the score function is free of the estimate of β as a result of the primal formalization.
The SVM algorithm classifies a new observation, z using $$\text{sign}\left(\widehat{f}\left(z\right)\right).$$
In some cases, there is a nonlinear boundary separating the classes. Nonlinear SVM works in a transformed predictor space to find an optimal, separating hyperplane.
The dual formalization for nonlinear SVM is
$$0.5{\displaystyle \sum _{j=1}^{n}{\displaystyle \sum _{k=1}^{n}{\alpha}_{j}}}{\alpha}_{k}{y}_{j}{y}_{k}G({x}_{j},{x}_{k}){\displaystyle \sum _{j=1}^{n}{\alpha}_{j}}$$
with respect to α_{1},...,α_{n}, subject to $$\sum {\alpha}_{j}}{y}_{j}=0$$, $$0\le {\alpha}_{j}\le C$$ for all j = 1,..,n, and the KKT complementarity conditions.G(x_{k},x_{j}) are elements of the Gram matrix. The resulting score function is
$$\widehat{f}(x)={\displaystyle \sum _{j=1}^{n}{\widehat{\alpha}}_{j}}{y}_{j}G(x,{x}_{j})+\widehat{b}.$$
For more details, see Understanding Support Vector Machines, [1], and [3].
Unless your data set is large, always try to standardize
the predictors (see Standardize
). Standardization
makes predictors insensitive to the scales on which they are measured.
It is good practice to crossvalidate using the KFold
namevalue
pair argument. The crossvalidation results determine how well the
SVM classifier generalizes.
For oneclass learning:
The default setting for the namevalue pair argument Alpha
can
lead to long training times. To speed up training, set Alpha
to
a vector mostly composed of 0
s.
Set the namevalue pair argument Nu
to
a value closer to 0
to yield fewer support vectors,
and, therefore, a smoother, but crude decision boundary.
Sparsity in support vectors is a desirable property
of an SVM classifier. To decrease the number of support vectors, set BoxConstraint
to
a large value. This action increases the training time.
For optimal training time, set CacheSize
as
high as the memory limit on your computer allows.
If you expect many fewer support vectors than observations
in the training set, then you can significantly speed up convergence
by shrinking the active set using the namevalue pair argument 'ShrinkagePeriod'
.
It is good practice to use 'ShrinkagePeriod',1000
.
Duplicate observations that are far from the decision
boundary do not affect convergence. However, just a few duplicate
observations that occur near the decision boundary can slow down convergence
considerably. To speed up convergence, specify 'RemoveDuplicates',true
if:
Your data set contains many duplicate observations.
You suspect that a few duplicate observations fall near the decision boundary.
However, to maintain the original data set during training, fitcsvm
must
temporarily store separate data sets: the original and one without
the duplicate observations. Therefore, if you specify true
for
data sets containing few duplicates, then fitcsvm
consumes
close to double the memory of the original data.
NaN
, <undefined>
,
and empty character vector (''
) values indicate
missing values. fitcsvm
removes entire rows of
data corresponding to a missing response. When computing total weights
(see the next bullets), fitcsvm
ignores any weight
corresponding to an observation with at least one missing predictor. This
action can lead to unbalanced prior probabilities in balancedclass
problems. Consequently, observation box constraints might not equal BoxConstraint
.
fitcsvm
removes observations that
have zero weight or prior probability.
For twoclass learning, if you specify the cost matrix $$\mathcal{C}$$ (see Cost
),
then the software updates the class prior probabilities p (see Prior
)
to p_{c} by incorporating the
penalties described in $$\mathcal{C}$$.
Specifically, fitcsvm
:
Computes $${p}_{c}^{\ast}=p\prime \mathcal{C}.$$
Normalizes p_{c}^{*} so that the updated prior probabilities sum 1:
$${p}_{c}=\frac{1}{{\displaystyle \sum _{j=1}^{K}{p}_{c,j}^{\ast}}}{p}_{c}^{\ast}.$$
K is the number of classes.
Resets the cost matrix to the default:
$$\mathcal{C}=\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right].$$
Removes observations from the training data corresponding to classes with zero prior probability.
For twoclass learning, fitcsvm
normalizes
all observation weights (see Weights
) to sum
to 1. Then, renormalizes the normalized weights to sum up to the updated,
prior probability of the class to which the observation belongs. That
is, the total weight for observation j in class k is
$${w}_{j}^{\ast}=\frac{{w}_{j}}{{\displaystyle \sum _{\forall j\in \text{Class}k}{w}_{j}}}{p}_{c,k}.$$
w_{j} is the normalized weight for observation j; p_{c,k} is the updated prior probability of class k (see previous bullet).
For twoclass learning, fitcsvm
assigns
a box constraint to each observation in the training data. The formula
for the box constraint of observation j is
$${C}_{j}=n{C}_{0}{w}_{j}^{\ast}.$$
n is
the training sample size, C_{0} is
the initial box constraint (see BoxConstraint
),
and $${w}_{j}^{\ast}$$ is
the total weight of observation j (see previous bullet).
If you set 'Standardize',true
and
any of 'Cost'
, 'Prior'
, or 'Weights'
,
then fitcsvm
standardizes the predictors using
their corresponding weighted means and weighted standard deviations.
That is, fitcsvm
standardizes predictor j (x_{j})
using
$${x}_{j}^{\ast}=\frac{{x}_{j}{\mu}_{j}^{\ast}}{{\sigma}_{j}^{\ast}}.$$
$${\mu}_{j}^{\ast}=\frac{1}{{\displaystyle \sum _{k}{w}_{k}^{\ast}}}{\displaystyle \sum _{k}{w}_{k}^{\ast}{x}_{jk}}.$$
x_{jk} is observation k (row) of predictor j (column).
$${\left({\sigma}_{j}^{\ast}\right)}^{2}=\frac{{v}_{1}}{{v}_{1}^{2}{v}_{2}}{\displaystyle \sum _{k}{w}_{k}^{\ast}{\left({x}_{jk}{\mu}_{j}^{\ast}\right)}^{2}}.$$
$${v}_{1}={\displaystyle \sum _{j}{w}_{j}^{\ast}}.$$
$${v}_{2}={\displaystyle \sum _{j}{\left({w}_{j}^{\ast}\right)}^{2}}.$$
Let p
be the proportion of outliers
that you expect in the training data. If you set 'OutlierFraction',p
,
then:
For oneclass learning, the software trains the bias
term such that 100p
% of the observations in the
training data have negative scores.
The software implements robust learning for
twoclass learning. In other words, the software attempts to remove
100p
% of the observations when the optimization
algorithm converges. The removed observations correspond to gradients
that are large in magnitude.
If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.
The PredictorNames
property stores
one element for each of the original predictor variable names. For
example, assume that there are three predictors, one of which is a
categorical variable with three levels. Then PredictorNames
is
a 1by3 cell array of character vectors containing the original names
of the predictor variables.
The ExpandedPredictorNames
property
stores one element for each of the predictor variables, including
the dummy variables. For example, assume that there are three predictors,
one of which is a categorical variable with three levels. Then ExpandedPredictorNames
is
a 1by5 cell array of character vectors containing the names of the
predictor variables and the new dummy variables.
Similarly, the Beta
property stores
one beta coefficient for each predictor, including the dummy variables.
The SupportVectors
property stores
the predictor values for the support vectors, including the dummy
variables. For example, assume that there are m support
vectors and three predictors, one of which is a categorical variable
with three levels. Then SupportVectors
is an nby5
matrix.
The X
property stores the training
data as originally input. It does not include the dummy variables.
When the input is a table, X
contains only the
columns used as predictors.
For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.
For a variable having k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is 1 for levels up to j, and +1 for levels j + 1 through k.
The names of the dummy variables stored in the ExpandedPredictorNames
property
indicate the first level with the value +1.
The software stores k – 1 additional
predictor names for the dummy variables, including the names of levels
2, 3, ..., k.
All solvers implement L1 softmargin minimization.
fitcsvm
and svmtrain
use,
among other algorithms, SMO for optimization. The software implements
SMO differently between the two functions, but numerical studies show
that there is sensible agreement in the results.
For oneclass learning, the software estimates the Lagrange multipliers, α_{1},...,α_{n}, such that
$$\sum _{j=1}^{n}{\alpha}_{j}}=n\nu .$$
[1] Christianini, N., and J. C. ShaweTaylor. An Introduction to Support Vector Machines and Other KernelBased Learning Methods. Cambridge, UK: Cambridge University Press, 2000.
[2] Fan, R.E., P.H. Chen, and C.J. Lin. "Working set selection using second order information for training support vector machines." Journal of Machine Learning Research, Vol 6, 2005, pp. 1889–1918.
[3] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.
[4] Kecman V., T. M. Huang, and M. Vogt. "Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance." In Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: SpringerVerlag, 2005.
[5] Scholkopf, B., J. C. Platt, J. C. ShaweTaylor, A. J. Smola, and R. C. Williamson. "Estimating the Support of a HighDimensional Distribution." Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.
[6] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.
ClassificationPartitionedModel
 ClassificationSVM
 CompactClassificationSVM
 fitcecoc
 fitclinear
 fitSVMPosterior
 predict
 quadprog
 rng
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
You can also select a location from the following list: