Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

fitcsvm

Train binary support vector machine classifier

fitcsvm trains or cross-validates a support vector machine (SVM) model for two-class (binary) classification on a low- through moderate-dimensional predictor data set. fitcsvm supports mapping the predictor data using kernel functions, and supports SMO, ISDA, or L1 soft-margin minimization via quadratic programming for objective-function minimization.

To train a linear SVM model for binary classification on a high-dimensional data set, that is, data sets that include many predictor variables, use fitclinear instead.

For multiclass learning by combining binary SVM models, use error-correcting output codes (ECOC). For more details, see fitcecoc.

To train an SVM regression model, see fitrsvm for low- through moderate-dimensional predictor data sets, or fitrlinear for high-dimensional data sets.

Syntax

  • Mdl = fitcsvm(Tbl,ResponseVarName)
  • Mdl = fitcsvm(Tbl,formula)
  • Mdl = fitcsvm(Tbl,Y)
  • Mdl = fitcsvm(___,Name,Value)
    example

Description

Mdl = fitcsvm(Tbl,ResponseVarName) returns a support vector machine classifier Mdl trained using the sample data contained in a table (Tbl). ResponseVarName is the name of the variable in Tbl that contains the class labels for one- or two-class classification.

Mdl = fitcsvm(Tbl,formula) returns an SVM classifer trained using the sample data contained in a table (Tbl). formula is an explanatory model of the response and a subset of predictor variables in Tbl used to fit Mdl.

Mdl = fitcsvm(Tbl,Y) returns an SVM classifer trained using the predictor variables in table Tbl and class labels in vector Y.

example

Mdl = fitcsvm(X,Y) returns an SVM classifier trained using the predictors in the matrix X and class labels in vector Y for one- or two-class classification.

example

Mdl = fitcsvm(___,Name,Value) returns a support vector machine classifier with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, you can specify the type of cross-validation, the cost for misclassification, or the type of score transformation function.

Examples

collapse all

Load Fisher's iris data set. Remove the sepal lengths and widths, and all observed setosa irises.

load fisheriris
inds = ~strcmp(species,'setosa');
X = meas(inds,3:4);
y = species(inds);

Train an SVM classifier using the processed data set.

SVMModel = fitcsvm(X,y)
SVMModel = 

  ClassificationSVM
             ResponseName: 'Y'
    CategoricalPredictors: []
               ClassNames: {'versicolor'  'virginica'}
           ScoreTransform: 'none'
          NumObservations: 100
                    Alpha: [24×1 double]
                     Bias: -14.4149
         KernelParameters: [1×1 struct]
           BoxConstraints: [100×1 double]
          ConvergenceInfo: [1×1 struct]
          IsSupportVector: [100×1 logical]
                   Solver: 'SMO'


The Command Window shows that SVMModel is a trained ClassificationSVM classifier and a property list. Display the properties of SVMModel, for example, to determine the class order, by using dot notation.

classOrder = SVMModel.ClassNames
classOrder =

  2×1 cell array

    'versicolor'
    'virginica'

The first class ('versicolor') is the negative class, and the second ('virginica') is the positive class. You can change the class order during training by using the 'ClassNames' name-value pair argument.

Plot a scatter diagram of the data and circle the support vectors.

sv = SVMModel.SupportVectors;
figure
gscatter(X(:,1),X(:,2),y)
hold on
plot(sv(:,1),sv(:,2),'ko','MarkerSize',10)
legend('versicolor','virginica','Support Vector')
hold off

The support vectors are observations that occur on or beyond their estimated class boundaries.

You can adjust the boundaries (and therefore the number of support vectors) by setting a box constraint during training using the 'BoxConstraint' name-value pair argument.

Load the ionosphere data set.

load ionosphere
rng(1); % For reproducibility

Train an SVM classifier using the radial basis kernel. Let the software find a scale value for the kernel function. It is good practice to standardize the predictors.

SVMModel = fitcsvm(X,Y,'Standardize',true,'KernelFunction','RBF',...
    'KernelScale','auto');

SVMModel is a trained ClassificationSVM classifier.

Cross validate the SVM classifier. By default, the software uses 10-fold cross validation.

CVSVMModel = crossval(SVMModel);

CVSVMModel is a ClassificationPartitionedModel cross-validated classifier.

Estimate the out-of-sample misclassification rate.

classLoss = kfoldLoss(CVSVMModel)
classLoss =

    0.0484

The generalization rate is approximately 5%.

Load Fisher's iris data set. Remove the petal lengths and widths. Treat all irises as coming from the same class.

load fisheriris
X = meas(:,1:2);
y = ones(size(X,1),1);

Train an SVM classifier using the processed data set. Assume that 5% of the observations are outliers. It is good practice to standardize the predictors.

rng(1);
SVMModel = fitcsvm(X,y,'KernelScale','auto','Standardize',true,...
    'OutlierFraction',0.05);

SVMModel is a trained ClassificationSVM classifier. By default, the software uses the Gaussian kernel for one-class learning.

Plot the observations and the decision boundary. Flag the support vectors and potential outliers.

svInd = SVMModel.IsSupportVector;
h = 0.02; % Mesh grid step size
[X1,X2] = meshgrid(min(X(:,1)):h:max(X(:,1)),...
    min(X(:,2)):h:max(X(:,2)));
[~,score] = predict(SVMModel,[X1(:),X2(:)]);
scoreGrid = reshape(score,size(X1,1),size(X2,2));

figure
plot(X(:,1),X(:,2),'k.')
hold on
plot(X(svInd,1),X(svInd,2),'ro','MarkerSize',10)
contour(X1,X2,scoreGrid)
colorbar;
title('{\bf Iris Outlier Detection via One-Class SVM}')
xlabel('Sepal Length (cm)')
ylabel('Sepal Width (cm)')
legend('Observation','Support Vector')
hold off

The boundary separating the outliers from the rest of the data occurs where the contour value is 0.

Verify that the fraction of observations with negative scores in the cross-validated data is close to 5%.

CVSVMModel = crossval(SVMModel);
[~,scorePred] = kfoldPredict(CVSVMModel);
outlierRate = mean(scorePred<0)
outlierRate =

    0.0467

Load Fisher's iris data set. Use the petal lengths and widths.

load fisheriris
X = meas(:,3:4);
Y = species;

Examine a scatter plot of the data.

figure
gscatter(X(:,1),X(:,2),Y);
h = gca;
lims = [h.XLim h.YLim]; % Extract the x and y axis limits
title('{\bf Scatter Diagram of Iris Measurements}');
xlabel('Petal Length (cm)');
ylabel('Petal Width (cm)');
legend('Location','Northwest');

There are three classes, one of which is linearly separable from the others.

For each class:

  1. Create a logical vector (indx) indicating whether an observation is a member of the class.

  2. Train an SVM classifier using the predictor data and indx.

  3. Store the classifier in a cell of a cell array.

It is good practice to define the class order.

SVMModels = cell(3,1);
classes = unique(Y);
rng(1); % For reproducibility

for j = 1:numel(classes);
    indx = strcmp(Y,classes(j)); % Create binary classes for each classifier
    SVMModels{j} = fitcsvm(X,indx,'ClassNames',[false true],'Standardize',true,...
        'KernelFunction','rbf','BoxConstraint',1);
end

SVMModels is a 3-by-1 cell array, with each cell containing a ClassificationSVM classifier. For each cell, the positive class is setosa, versicolor, and virginica, respectively.

Define a fine grid within the plot, and treat the coordinates as new observations from the distribution of the training data. Estimate the score of the new observations using each classifier.

d = 0.02;
[x1Grid,x2Grid] = meshgrid(min(X(:,1)):d:max(X(:,1)),...
    min(X(:,2)):d:max(X(:,2)));
xGrid = [x1Grid(:),x2Grid(:)];
N = size(xGrid,1);
Scores = zeros(N,numel(classes));

for j = 1:numel(classes);
    [~,score] = predict(SVMModels{j},xGrid);
    Scores(:,j) = score(:,2); % Second column contains positive-class scores
end

Each row of Scores contains three scores. The index of the element with the largest score is the index of the class to which the new class observation most likely belongs.

Associate each new observation with the classifier that gives it the maximum score.

[~,maxScore] = max(Scores,[],2);

Color in the regions of the plot based on which class the corresponding new observation belongs.

figure
h(1:3) = gscatter(xGrid(:,1),xGrid(:,2),maxScore,...
    [0.1 0.5 0.5; 0.5 0.1 0.5; 0.5 0.5 0.1]);
hold on
h(4:6) = gscatter(X(:,1),X(:,2),Y);
title('{\bf Iris Classification Regions}');
xlabel('Petal Length (cm)');
ylabel('Petal Width (cm)');
legend(h,{'setosa region','versicolor region','virginica region',...
    'observed setosa','observed versicolor','observed virginica'},...
    'Location','Northwest');
axis tight
hold off

This example shows how to optimize hyperparameters automatically using fitcsvm. The example uses the ionosphere data.

Load the data.

load ionosphere

Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization.

For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function.

rng default
Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto',...
    'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
    'expected-improvement-plus'))
|=================================================================================================|
| Iter | Eval   | Objective  | Objective  | BestSoFar  | BestSoFar  | BoxConstrain-|  KernelScale |
|      | result |            | runtime    | (observed) | (estim.)   | t            |              |
|=================================================================================================|
|    1 | Best   |    0.12821 |     147.06 |    0.12821 |    0.12821 |      0.42371 |     0.006703 |
|    2 | Accept |    0.13675 |     1.1557 |    0.12821 |    0.12876 |      0.71122 |       4.9578 |
|    3 | Accept |    0.16809 |     159.18 |    0.12821 |    0.12998 |       462.63 |     0.018515 |
|    4 | Accept |    0.35897 |    0.69293 |    0.12821 |    0.12824 |    0.0016958 |       247.22 |
|    5 | Accept |    0.16809 |     148.84 |    0.12821 |    0.12826 |       27.342 |     0.011315 |
|    6 | Accept |    0.35897 |    0.90372 |    0.12821 |    0.13632 |      0.14834 |       121.74 |
|    7 | Accept |    0.23362 |     142.75 |    0.12821 |    0.14021 |       30.658 |     0.001022 |
|    8 | Accept |     0.1396 |     3.2571 |    0.12821 |    0.14247 |       49.065 |        1.581 |
|    9 | Accept |     0.1339 |    0.83558 |    0.12821 |    0.13549 |    0.0085176 |      0.28903 |
|   10 | Best   |    0.12821 |     1.0662 |    0.12821 |    0.12931 |    0.0010271 |     0.015302 |
|   11 | Accept |    0.35897 |    0.66735 |    0.12821 |    0.13597 |    0.0010864 |       1.6148 |
|   12 | Accept |     0.1339 |     3.5592 |    0.12821 |    0.12442 |      0.52736 |      0.12943 |
|   13 | Best   |    0.12536 |     8.8503 |    0.12536 |    0.12052 |     0.068293 |     0.029608 |
|   14 | Accept |     0.1339 |     3.6091 |    0.12536 |    0.12535 |       3.9987 |       0.3546 |
|   15 | Best   |    0.11966 |    0.56534 |    0.11966 |    0.11965 |    0.0058807 |      0.08002 |
|   16 | Accept |    0.11966 |     81.168 |    0.11966 |    0.11965 |    0.0057969 |    0.0016972 |
|   17 | Accept |    0.12821 |     0.5936 |    0.11966 |    0.11964 |      0.59862 |       1.0498 |
|   18 | Accept |    0.12251 |     40.358 |    0.11966 |    0.11964 |     0.015919 |    0.0055785 |
|   19 | Accept |     0.1339 |    0.71137 |    0.11966 |    0.12051 |     0.036384 |      0.16065 |
|   20 | Accept |    0.12821 |    0.61387 |    0.11966 |    0.12032 |       4.0737 |       2.5745 |
|=================================================================================================|
| Iter | Eval   | Objective  | Objective  | BestSoFar  | BestSoFar  | BoxConstrain-|  KernelScale |
|      | result |            | runtime    | (observed) | (estim.)   | t            |              |
|=================================================================================================|
|   21 | Accept |    0.12536 |     84.685 |    0.11966 |    0.12067 |    0.0010889 |    0.0010109 |
|   22 | Accept |    0.13105 |      165.7 |    0.11966 |    0.12037 |       987.01 |      0.21248 |
|   23 | Accept |    0.12536 |      65.03 |    0.11966 |    0.12009 |    0.0010574 |    0.0033974 |
|   24 | Accept |     0.1396 |     1.3243 |    0.11966 |    0.12002 |       969.03 |       12.066 |
|   25 | Accept |    0.12251 |     25.978 |    0.11966 |    0.11983 |       995.56 |       2.1147 |
|   26 | Accept |     0.1567 |     206.51 |    0.11966 |    0.12009 |     0.039285 |    0.0010501 |
|   27 | Accept |    0.13105 |     3.4498 |    0.11966 |    0.12081 |    0.0071342 |     0.023989 |
|   28 | Accept |    0.12251 |     103.95 |    0.11966 |    0.12011 |      0.72608 |     0.023072 |
|   29 | Accept |    0.12536 |      1.127 |    0.11966 |    0.11968 |    0.0010318 |     0.064622 |
|   30 | Accept |    0.35897 |      1.138 |    0.11966 |    0.12037 |       936.65 |       948.48 |

__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 30 reached.
Total function evaluations: 30
Total elapsed time: 1705.2912 seconds.
Total objective function evaluation time: 1405.3256

Best observed feasible point:
    BoxConstraint    KernelScale
    _____________    ___________

    0.0058807        0.08002    

Observed objective function value = 0.11966
Estimated objective function value = 0.12037
Function evaluation time = 0.56534

Best estimated feasible point (according to models):
    BoxConstraint    KernelScale
    _____________    ___________

    0.0058807        0.08002    

Estimated objective function value = 0.12037
Estimated function evaluation time = 0.89038


Mdl = 

  ClassificationSVM
                         ResponseName: 'Y'
                CategoricalPredictors: []
                           ClassNames: {'b'  'g'}
                       ScoreTransform: 'none'
                      NumObservations: 351
    HyperparameterOptimizationResults: [1×1 BayesianOptimization]
                                Alpha: [105×1 double]
                                 Bias: -3.7681
                     KernelParameters: [1×1 struct]
                       BoxConstraints: [351×1 double]
                      ConvergenceInfo: [1×1 struct]
                      IsSupportVector: [351×1 logical]
                               Solver: 'SMO'


Related Examples

Input Arguments

collapse all

Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed.

If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable using ResponseVarName.

If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula using formula.

If Tbl does not contain the response variable, then specify a response variable using Y. The length of response variable and the number of rows of Tbl must be equal.

Data Types: table

Response variable name, specified as the name of a variable in Tbl.

You must specify ResponseVarName as a character vector. For example, if the response variable Y is stored as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model.

The response variable must be a categorical or character array, logical or numeric vector, or cell array of character vectors. If Y is a character array, then each element must correspond to one row of the array.

It is good practice to specify the order of the classes using the ClassNames name-value pair argument.

Data Types: char

Explanatory model of the response and a subset of the predictor variables, specified as a character vector in the form of 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. The variables must be variable names in Tbl (Tbl.Properties.VariableNames).

To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula.

Data Types: char

Class labels to which the SVM model is trained, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

  • Y must contain at most two distinct classes. For multiclass learning, see fitcecoc.

  • If Y is a character array, then each element must correspond to one row of the array.

  • The length of Y and the number of rows of Tbl or X must be equal.

  • It is good practice to specify the class order using the ClassNames name-value pair argument.

Data Types: char | cell | categorical | logical | single | double

Predictor data to which the SVM classifier is trained, specified as a matrix of numeric values.

Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one predictor.

The length of Y and the number of rows of X must be equal.

To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value pair argument.

Data Types: double | single

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'KFold',10,'Cost',[0 2;1 0],'ScoreTransform','sign' specifies to perform 10-fold cross-validation, apply double the penalty to false positives compared to false negatives, and transform the scores using the sign function.

    Note:   You cannot use any cross-validation name-value pair along with OptimizeHyperparameters. You can modify the cross-validation for OptimizeHyperparameters only by using the HyperparameterOptimizationOptions name-value pair.

Support Vector Machine Options

collapse all

Box constraint, specified as the comma-separated pair consisting of 'BoxConstraint' and a positive scalar.

For one-class learning, the software always sets the box constraint to 1.

For more details on the relationships and algorithmic behavior of BoxConstraint, Cost, Prior, Standardize, and Weights, see Algorithms.

Example: 'BoxConstraint',100

Data Types: double | single

Kernel function used to compute the Gram matrix, specified as the comma-separated pair consisting of 'KernelFunction' and a value in this table.

ValueDescriptionFormula
'gaussian' or 'rbf'Gaussian or Radial Basis Function (RBF) kernel, default for one-class learning

G(x1,x2)=exp(x1x22)

'linear'Linear kernel, default for two-class learning

G(x1,x2)=x1x2

'polynomial'Polynomial kernel. Use 'PolynomialOrder',p to specify a polynomial kernel of order p.

G(x1,x2)=(1+x1x2)p

You can set your own kernel function, for example, kernel, by setting 'KernelFunction','kernel'. kernel must have the following form:

function G = kernel(U,V)
where:

  • U is an m-by-p matrix.

  • V is an n-by-p matrix.

  • G is an m-by-n Gram matrix of the rows of U and V.

And kernel.m must be on the MATLAB® path.

It is good practice to avoid using generic names for kernel functions. For example, call a sigmoid kernel function 'mysigmoid' rather than 'sigmoid'.

Example: 'KernelFunction','gaussian'

Data Types: char

Kernel scale parameter, specified as the comma-separated pair consisting of 'KernelScale' and 'auto' or a positive scalar. The software divides all elements of the predictor matrix X by the value of KernelScale. Then, the software applies the appropriate kernel norm to compute the Gram matrix.

  • If you specify 'auto', then the software selects an appropriate scale factor using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. Therefore, to reproduce results, set a random number seed using rng before training.

  • If you specify KernelScale and your own kernel function, for example, kernel, using 'KernelFunction','kernel', then the software throws an error. You must apply scaling within kernel.

Example: 'KernelScale',''auto'

Data Types: double | single | char

Polynomial kernel function order, specified as the comma-separated pair consisting of 'PolynomialOrder' and a positive integer.

If you set 'PolynomialOrder' and KernelFunction is not 'polynomial', then the software throws an error.

Example: 'PolynomialOrder',2

Data Types: double | single

Kernel offset parameter, specified as the comma-separated pair consisting of 'KernelOffset' and a nonnegative scalar.

The software adds KernelOffset to each element of the Gram matrix.

The defaults are:

  • 0 if the solver is SMO (that is, you set 'Solver','SMO')

  • 0.1 if the solver is ISDA (that is, you set 'Solver','ISDA')

Example: 'KernelOffset',0

Data Types: double | single

Flag to standardize the predictor data, specified as the comma-separated pair consisting of 'Standardize' and true (1) or false (0).

If you set 'Standardize',true:

  • The software centers and scales each column of the predictor data (X) by the weighted column mean and standard deviation, respectively (for details on weighted standardizing, see Algorithms). MATLAB does not standardize the data contained in the dummy variable columns generated for categorical predictors.

  • The software trains the classifier using the standardized predictor matrix, but stores the unstandardized data in the classifier property X.

Example: 'Standardize',true

Data Types: logical

Optimization routine, specified as the comma-separated pair consisting of 'Solver' and a value in this table.

ValueDescription
'ISDA'Iterative Single Data Algorithm (see [30])
'L1QP'Uses quadprog to implement L1 soft-margin minimization by quadratic programming. This option requires an Optimization Toolbox™ license. For more details, see Quadratic Programming Definition.
'SMO'Sequential Minimal Optimization (see [17])

The defaults are:

  • 'ISDA' if you set 'OutlierFraction' to a positive value and for two-class learning

  • 'SMO' otherwise

Example: 'Solver','ISDA'

Data Types: char

Initial estimates of alpha coefficients, specified as the comma-separated pair consisting of 'Alpha' and a numeric vector of nonnegative values. The length of Alpha must be equal to the number of rows of X.

  • Each element of Alpha corresponds to an observation in X.

  • Alpha cannot contain any NaNs.

  • If you specify Alpha and any one of the cross-validations name-value pair arguments ('CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'), then the software returns an error.

If Y contains any missing values, then remove all rows of Y, X, and Alpha that correspond to the missing values. That is, enter:

idx = ~isundefined(categorical(Y));
Y = Y(idx,:);
X = X(idx,:);
alpha = alpha(idx);
Then, pass Y, X, and alpha as the response, predictors, and initial alpha estimates, respectively.

The defaults are:

  • 0.5*ones(size(X,1),1) for one-class learning

  • zeros(size(X,1),1) for two-class learning

Example: 'Alpha',0.1*ones(size(X,1),1)

Data Types: double | single

Cache size, specified as the comma-separated pair consisting of 'CacheSize' and 'maximal' or a positive scalar.

If CacheSize is 'maximal', then the software reserves enough memory to hold the entire n-by-n Gram matrix.

If CacheSize is a positive scalar, then the software reserves CacheSize megabytes of memory for training the classifier.

Example: 'CacheSize','maximal'

Data Types: double | char | single

Flag to clip alpha coefficients, specified as the comma-separated pair consisting of 'ClipAlphas' and either true or false.

Suppose that the alpha coefficient for observation j is αj and the box constraint of observation j is Cj, j = 1,...,n. n is the training sample size.

ValueDescription
trueAt each iteration, if αj is near 0 or near Cj, then MATLAB sets αj to 0 or to Cj, respectively.
falseMATLAB does not change the alpha coefficients during optimization.

MATLAB stores the final values of α in the Alpha property of the trained SVM model object.

ClipAlphas can affect SMO and ISDA convergence.

Example: 'ClipAlphas',false

Data Types: logical

ν parameter for one-class learning, specified as the comma-separated pair consisting of 'Nu' and a positive scalar. Nu must be greater than 0 and at most 1.

Set Nu to control the tradeoff between ensuring most training examples are in the positive class and minimizing the weights in the score function.

Example: 'Nu',0.25

Data Types: double | single

Number of iterations between optimization diagnostic message output, specified as the comma-separated pair consisting of 'NumPrint' and a nonnegative integer.

If you use 'Verbose',1 and 'NumPrint',numprint, then the software displays all optimization diagnostic messages from SMO and ISDA every numprint iterations in the Command Window.

Example: 'NumPrint',500

Data Types: double | single

Expected proportion of outliers in the training data, specified as the comma-separated pair consisting of 'OutlierFraction' and a numeric scalar in the interval [0,1).

If you set 'OutlierFraction',outlierfraction, where outlierfraction is a value greater than 0, then:

  • For two-class learning, the software implements robust learning. In other words, the software attempts to remove 100*outlierfraction% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

  • For one-class learning, the software finds an appropriate bias term such that outlierfraction of the observations in the training set have negative scores.

Example: 'OutlierFraction',0.01

Data Types: double | single

Flag to replace duplicate observations with single observations in the training data, specified as the comma-separated pair consisting of 'RemoveDuplicates' and true or false.

If RemoveDuplicates is true, then fitcsvm replaces duplicate observations in the training data with a single observation of the same value. The weight of the single observation is equal to the sum of the weights of the corresponding removed duplicates (see Weights).

    Tip   If your data set contains many duplicate observations, then specifying 'RemoveDuplicates',true can decrease convergence time considerably.

Data Types: logical

Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and either 0, 1, or 2. Verbose controls the amount of optimization information that the software displays in the Command Window and saves as a structure to Mdl.ConvergenceInfo.History.

This table summarizes the available verbosity level options.

ValueDescription
0The software does not display or save convergence information.
1The software displays diagnostic messages and saves convergence criteria every numprint iterations, where numprint is the value of the name-value pair argument 'NumPrint'.
2The software displays diagnostic messages and saves convergence criteria at every iteration.

Example: 'Verbose',1

Data Types: double | single

Other Classification Options

collapse all

List of categorical predictors, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the following:

  • A numeric vector with indices from 1 through p, where p is the number of columns of X.

  • A logical vector of length p, where a true entry means that the corresponding column of X is a categorical variable.

  • A cell array of character vectors, where each element in the array is the name of a predictor variable. The names must match entries in PredictorNames values.

  • 'all', meaning all predictors are categorical.

By default, if the predictor data is in a matrix (X), fitcsvm assumes that none of the predictors are categorical. If the predictor data is in a table (Tbl), fitcsvm assumes that a variable is categorical if it contains logical values, categorical values, or a cell array of character vectors.

For example, the following syntax specifies that columns 1 and 3 of the input matrix X contain categorical variables.

Example: 'CategoricalPredictors',[1,3]

Data Types: single | double | logical | cell

Names of classes to use for training, specified as the comma-separated pair consisting of 'ClassNames' and a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames must be the same data type as Y.

If ClassNames is a character array, then each element must correspond to one row of the array.

Use ClassNames to:

  • Order the classes during training.

  • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use ClassNames to specify the order of the dimensions of Cost or the column order of classification scores returned by predict.

  • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}.

The default is the set of all distinct class names in Y.

Example: 'ClassNames',{'b','g'}

Data Types: categorical | char | logical | single | double | cell

Misclassification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure. If you specify:

  • The square matrix Cost, then, if true class of an observation is i, Cost(i,j) is the cost of classifying a point into class j. That is, rows correspond to the true classes and the columns correspond to predicted classes. To specify the class order for the corresponding rows and columns of Cost, also specify the ClassNames name-value pair argument.

  • The structure S, then it must have two fields:

    • S.ClassNames, which contains the class names as a variable of the same data type as Y

    • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames

For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix. Consequently, the cost matrix resets to the default. For more details on the relationships and algorithmic behavior of BoxConstraint, Cost, Prior, Standardize, and Weights, see Algorithms.

The defaults are:

  • For one-class learning, Cost = 0.

  • For two-class learning, Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j.

Example: 'Cost',[0,1;2,0]

Data Types: double | single | struct

Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data.

  • If you supply X and Y, then you can use 'PredictorNames' to give the predictor variables in X names.

    • The order of the names in PredcitorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.

    • By default, PredictorNames is {x1,x2,...}.

  • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcsvm uses the predictor variables in PredictorNames and the response only in training.

    • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable.

    • By default, PredictorNames contains the names of all predictor variables.

    • It good practice to specify the predictors for training using one of 'PredictorNames' or formula only.

Example: 'PredictorNames',{'SepalLength','SepalWidth','PedalLength','PedalWidth'}

Data Types: cell

Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table.

ValueDescription
'empirical'The class prior probabilities are the class relative frequencies in Y.
'uniform'All class prior probabilities are equal to 1/K, where K is the number of classes.
numeric vectorEach element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames name-value pair argument. The software normalizes the elements such that they sum to 1.
structure

A structure S with two fields:

  • S.ClassNames contains the class names as a variable of the same type as Y.

  • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix. For more details on the relationships and algorithmic behavior of BoxConstraint, Cost, Prior, Standardize, and Weights, see Algorithms.

Example: struct('ClassNames',{{'setosa','versicolor','virginica'}},'ClassProbs',1:3)

Data Types: char | double | single | struct

Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector.

  • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable.

  • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'.

Example: 'ResponseName','response'

Data Types: char

Score transform function, specified as the comma-separated pair consisting of 'ScoreTransform' and a character vector or function handle.

  • This table summarizes the available built-in functions.

    ValueFormula
    'doublelogit'1/(1 + e–2x)
    'invlogit'log(x / (1–x))
    'ismax'Set the score for the class with the largest score to 1, and scores for all other classes to 0.
    'logit'1/(1 + ex)
    'none' or 'identity'x (no transformation)
    'sign'–1 for x < 0
    0 for x = 0
    1 for x > 0
    'symmetric'2x – 1
    'symmetriclogit'2/(1 + ex) – 1
    'symmetricismax'Set the score for the class with the largest score to 1, and scores for all other classes to -1.

  • For a MATLAB function, or a function that you define, enter its function handle.

    Mdl.ScoreTransform = @function;

    function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Example: 'ScoreTransform','sign'

Data Types: char | function_handle

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl.

If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model.

By default, Weights is ones(n,1), where n is the number of observations in X or Tbl.

The software normalizes Weights to sum up to the value of the prior probability in the respective class. For more details on the relationships and algorithmic behavior of BoxConstraint, Cost, Prior, Standardize, and Weights, see Algorithms.

Data Types: double | single

Cross-Validation Options

collapse all

Flag to train a cross-validated classifier, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'.

If you specify 'on', then the software trains a cross-validated classifier with 10 folds.

You can override this cross-validation setting using one of the CVPartition, Holdout, KFold, or Leaveout name-value pair arguments. You can only use one cross-validation name-value pair argument at a time to create a cross-validated model.

Alternatively, cross-validate later by passing Mdl to crossval.

Example: 'Crossval','on'

Data Types: char

Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. The partition object specifies the type of cross-validation, and also the indexing for training and validation sets.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software:

  1. Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data

  2. Stores the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Holdout',0.1

Data Types: double | single

Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software:

  1. Randomly partitions the data into k sets

  2. For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

  3. Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'KFold',5

Data Types: single | double

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations, where n is size(Mdl.X,1), the software:

  1. Reserves the observation as validation data, and trains the model using the other n – 1 observations

  2. Stores the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout.

Example: 'Leaveout','on'

Data Types: char

Convergence Controls

collapse all

Tolerance for the gradient difference between upper and lower violators obtained by Sequential Minimal Optimization (SMO) or Iterative Single Data Algorithm (ISDA), specified as the comma-separated pair consisting of 'DeltaGradientTolerance' and a nonnegative scalar.

If DeltaGradientTolerance is 0, then the software does not use the tolerance for the gradient difference to check for optimization convergence.

The defaults are:

  • 1e-3 if the solver is SMO (for example, you set 'Solver','SMO')

  • 0 if the solver is ISDA (for example, you set 'Solver','ISDA')

Example: 'DeltaGapTolerance',1e-2

Data Types: double | single

Feasibility gap tolerance obtained by SMO or ISDA, specified as the comma-separated pair consisting of 'GapTolerance' and a nonnegative scalar.

If GapTolerance is 0, then the software does not use the feasibility gap tolerance to check for optimization convergence.

Example: 'GapTolerance',1e-2

Data Types: double | single

Maximal number of numerical optimization iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer.

The software returns a trained model regardless of whether the optimization routine successfully converges. Mdl.ConvergenceInfo contains convergence information.

Example: 'IterationLimit',1e8

Data Types: double | single

Karush-Kuhn-Tucker (KKT) complementarity conditions violation tolerance, specified as the comma-separated pair consisting of 'KKTTolerance' and a nonnegative scalar.

If KKTTolerance is 0, then the software does not use the KKT complementarity conditions violation tolerance to check for optimization convergence.

The defaults are:

  • 0 if the solver is SMO (for example, you set 'Solver','SMO')

  • 1e-3 if the solver is ISDA (for example, you set 'Solver','ISDA')

Example: 'KKTTolerance',1e-2

Data Types: double | single

Number of iterations between the movement of observations from the active to inactive set, specified as the comma-separated pair consisting of 'ShrinkagePeriod' and a nonnegative integer.

If you set 'ShrinkagePeriod',0, then the software does not shrink the active set.

Example: 'ShrinkagePeriod',1000

Data Types: double | single

Hyperparameter Optimization

collapse all

Parameters to optimize, specified as:

  • 'none' — Do not optimize.

  • 'auto' — Use {'BoxConstraint','KernelScale'}

  • 'all' — Optimize all eligible parameters.

  • Cell array of eligible parameter names

  • Vector of optimizableVariable objects, typically the output of hyperparameters

The optimization attempts to minimize the cross-validation loss (error) for fitcsvm by varying the parameters. For information about cross-validation loss (albeit in a different context), see Classification Loss. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair.

The eligible parameters for fitcsvm are:

  • BoxConstraintfitcsvm searches among positive values, by default log-scaled in the range [1e-3,1e3].

  • KernelScalefitcsvm searches among positive values, by default log-scaled in the range [1e-3,1e3].

  • KernelFunctionfitcsvm searches among 'gaussian', 'linear', and 'polynomial'.

  • PolynomialOrderfitcsvm searches among integers in the range [2,4].

  • Standardizefitcsvm searches among 'true' and 'false'.

Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example,

load fisheriris
params = hyperparameters('fitcsvm',meas,species);
params(1).Range = [1e-4,1e6];

Pass params as the value of OptimizeHyperparameters.

By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression, and the misclassification rate for classification. To control the iterative display, set the HyperparameterOptimizationOptions name-value pair, Verbose field. To control the plots, set the HyperparameterOptimizationOptions name-value pair, ShowPlots field.

For an example, see Optimize SVM Classifier.

Example: 'auto'

Data Types: char | cell

Options for optimization, specified as a structure. Modifies the effect of the OptimizeHyperparameters name-value pair. All fields in the structure are optional.

Field NameValuesDefault
Optimizer
  • 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

  • 'gridsearch' — Use grid search with NumGridDivisions values per dimension.

  • 'randomsearch' — Search at random among MaxObjectiveEvaluations points.

'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the commandsortrows(Mdl.ParameterOptimizationResults).

'bayesopt'
AcquisitionFunctionName
  • 'expected-improvement-per-second-plus'

  • 'expected-improvement'

  • 'expected-improvement-plus'

  • 'expected-improvement-per-second'

  • 'lower-confidence-bound'

  • 'probability-of-improvement'

For details, see the bayesopt AcquisitionFunctionName name-value pair, or Acquisition Function Types.
'expected-improvement-per-second-plus'
MaxObjectiveEvaluationsMaximum number of objective function evaluations.30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'
NumGridDivisionsFor 'gridsearch', the number of values in each dimension. Can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. Ignored for categorical variables.10
ShowPlotsLogical value indicating whether to show plots. If true, plots the best objective function value against iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.true
SaveIntermediateResultsLogical value indicating whether to save results when Optimizer is 'bayesopt'. If true, overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.false
VerboseDisplay to the command line.
  • 0 — No iterative display

  • 1 — Iterative display

  • 2 — Iterative display with extra information

For details, see the bayesoptVerbose name-value pair.
1
Repartition

Logical value indicating whether to repartition the cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization.

true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

false
Use no more than one of the following three field names.
CVPartitionA cvpartition object, as created by cvpartitionKfold = 5
HoldoutA scalar in the range (0,1) representing the holdout fraction.
KfoldAn integer greater than 1.

Example: struct('MaxObjectiveEvaluations',60)

Data Types: struct

Output Arguments

collapse all

Trained SVM classification model, returned as a ClassificationSVM model object or ClassificationPartitionedModel cross-validated modle object.

If you set any of the name-value pair arguments KFold, Holdout, Leaveout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedModel cross-validated classifier. Otherwise, Mdl is a ClassificationSVM classifier.

To reference properties of Mdl, use dot notation. For example, enter Mdl.Alpha in the Command Window to display the trained Lagrange multipliers.

Limitations

  • fitcsvm trains SVM classifiers for one- or two-class learning applications. To train SVM classifiers using data with more than two classes, use fitcecoc.

  • fitcsvm supports low- through moderate-dimensional data sets. For high-dimensional data set, use fitclinear instead.

More About

collapse all

Box Constraint

A parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization).

If you increase the box constraint, then the SVM classifier assigns fewer support vectors. However, increasing the box constraint can lead to longer training times.

Gram Matrix

The Gram matrix of a set of n vectors {x1,..,xn; xjRp} is an n-by-n matrix with element (j,k) defined as G(xj,xk) = <ϕ(xj),ϕ(xk)>, an inner product of the transformed predictors using the kernel function ϕ.

For nonlinear SVM, the algorithm forms a Gram matrix using the predictor matrix columns. The dual formalization replaces the inner product of the predictors with corresponding elements of the resulting Gram matrix (called the "kernel trick"). Subsequently, nonlinear SVM operates in the transformed predictor space to find a separating hyperplane.

Karush-Kuhn-Tucker Complementarity Conditions

KKT complementarity conditions are optimization constraints required for optimal nonlinear programming solutions.

In SVM, the KKT complementarity conditions are

{αj[yjf(xj)1+ξj]=0ξj(Cαj)=0

for all j = 1,...,n, where f(xj)=ϕ(xj)β+b, ϕ is a kernel function (see Gram matrix), and ξj is a slack variable. If the classes are perfectly separable, then ξj = 0 for all j = 1,...,n.

One-Class Learning

One-class learning, or unsupervised SVM, aims at separating data from the origin in the high-dimensional, predictor space (not the original predictor space), and is an algorithm used for outlier detection.

The algorithm resembles that of SVM for binary classification. The objective is to minimize dual expression

0.5jkαjαkG(xj,xk)

with respect to α1,...,αn, subject to

αj=nν

and 0αj1 for all j = 1,...,n. G(xj,xk) is element (j,k) of the Gram matrix.

A small value of ν leads to fewer support vectors, and, therefore, a smooth, crude decision boundary. A large value of ν leads to more support vectors, and therefore, a curvy, flexible decision boundary. The optimal value of ν should be large enough to capture the data complexity and small enough to avoid overtraining. Also, 0 < ν ≤ 1.

For more details, see [5].

Support Vector

Support vectors are observations corresponding to strictly positive estimates of α1,...,αn.

SVM classifiers that yield fewer support vectors for a given training set are more desirable.

Support Vector Machines for Binary Classification

The SVM binary classification algorithm searches for an optimal hyperplane that separates the data into two classes. For separable classes, the optimal hyperplane maximizes a margin (space that does not contain any observations) surrounding itself, which creates boundaries for the positive and negative classes. For inseparable classes, the objective is the same, but the algorithm imposes a penalty on the length of the margin for every observation that is on the wrong side of its class boundary.

The linear SVM score function is

f(x)=xβ+b,

where:

  • x is an observation (corresponding to a row of X).

  • The vector β contains the coefficients that define an orthogonal vector to the hyperplane (corresponding to Mdl.Beta). For separable data, the optimal margin length is 2/β.

  • b is the bias term (corresponding to Mdl.Bias).

The root of f(x) for particular coefficients defines a hyperplane. For a particular hyperplane, f(z) is the distance from point z to the hyperplane.

The algorithm searches for the maximum margin length, while keeping observations in the positive (y = 1) and negative (y = –1) classes separate. Therefore:

  • For separable classes, the objective is to minimize β with respect to the β and b subject to yjf(xj) ≥ 1, for all j = 1,..,n. This is the primal formalization for separable classes.

  • For inseparable classes, the algorithm uses slack variables (ξj) to penalize the objective function for observations that cross the margin boundary for their class. ξj = 0 for observations that do not cross the margin boundary for their class, otherwise ξj ≥ 0.

    The objective is to minimize0.5β2+Cξj with respect to the β, b, and ξj subject to yjf(xj)1ξj and ξj0 for all j = 1,..,n, and for a positive scalar box constraint C. This is the primal formalization for inseparable classes.

The algorithm uses the Lagrange multipliers method to optimize the objective. This introduces n coefficients α1,...,αn (corresponding to Mdl.Alpha). The dual formalizations for linear SVM are:

  • For separable classes, minimize

    0.5j=1nk=1nαjαkyjykxjxkj=1nαj

    with respect to α1,...,αn, subject to αjyj=0, αj ≥ 0 for all j = 1,...,n, and Karush-Kuhn-Tucker (KKT) complementarity conditions.

  • For inseparable classes, the objective is the same as for separable classes, except for the additional condition 0αjC for all j = 1,..,n.

The resulting score function is

f^(x)=j=1nα^jyjxxj+b^.

b^ is the estimate of the bias and α^j is the jth estimate of the vector α^, j = 1,...,n. Written this way, the score function is free of the estimate of β as a result of the primal formalization.

The SVM algorithm classifies a new observation, z using sign(f^(z)).

In some cases, there is a nonlinear boundary separating the classes. Nonlinear SVM works in a transformed predictor space to find an optimal, separating hyperplane.

The dual formalization for nonlinear SVM is

0.5j=1nk=1nαjαkyjykG(xj,xk)j=1nαj

with respect to α1,...,αn, subject to αjyj=0, 0αjC for all j = 1,..,n, and the KKT complementarity conditions.G(xk,xj) are elements of the Gram matrix. The resulting score function is

f^(x)=j=1nα^jyjG(x,xj)+b^.

For more details, see Understanding Support Vector Machines, [1], and [3].

Tips

  • Unless your data set is large, always try to standardize the predictors (see Standardize). Standardization makes predictors insensitive to the scales on which they are measured.

  • It is good practice to cross-validate using the KFold name-value pair argument. The cross-validation results determine how well the SVM classifier generalizes.

  • For one-class learning:

    • The default setting for the name-value pair argument Alpha can lead to long training times. To speed up training, set Alpha to a vector mostly composed of 0s.

    • Set the name-value pair argument Nu to a value closer to 0 to yield fewer support vectors, and, therefore, a smoother, but crude decision boundary.

  • Sparsity in support vectors is a desirable property of an SVM classifier. To decrease the number of support vectors, set BoxConstraint to a large value. This action increases the training time.

  • For optimal training time, set CacheSize as high as the memory limit on your computer allows.

  • If you expect many fewer support vectors than observations in the training set, then you can significantly speed up convergence by shrinking the active set using the name-value pair argument 'ShrinkagePeriod'. It is good practice to use 'ShrinkagePeriod',1000.

  • Duplicate observations that are far from the decision boundary do not affect convergence. However, just a few duplicate observations that occur near the decision boundary can slow down convergence considerably. To speed up convergence, specify 'RemoveDuplicates',true if:

    • Your data set contains many duplicate observations.

    • You suspect that a few duplicate observations fall near the decision boundary.

    However, to maintain the original data set during training, fitcsvm must temporarily store separate data sets: the original and one without the duplicate observations. Therefore, if you specify true for data sets containing few duplicates, then fitcsvm consumes close to double the memory of the original data.

Algorithms

  • NaN, <undefined>, and empty character vector ('') values indicate missing values. fitcsvm removes entire rows of data corresponding to a missing response. When computing total weights (see the next bullets), fitcsvm ignores any weight corresponding to an observation with at least one missing predictor. This action can lead to unbalanced prior probabilities in balanced-class problems. Consequently, observation box constraints might not equal BoxConstraint.

  • fitcsvm removes observations that have zero weight or prior probability.

  • For two-class learning, if you specify the cost matrix C (see Cost), then the software updates the class prior probabilities p (see Prior) to pc by incorporating the penalties described in C.

    Specifically, fitcsvm:

    1. Computes pc=pC.

    2. Normalizes pc* so that the updated prior probabilities sum 1:

      pc=1j=1Kpc,jpc.

      K is the number of classes.

    3. Resets the cost matrix to the default:

      C=[0110].

    4. Removes observations from the training data corresponding to classes with zero prior probability.

  • For two-class learning, fitcsvm normalizes all observation weights (see Weights) to sum to 1. Then, renormalizes the normalized weights to sum up to the updated, prior probability of the class to which the observation belongs. That is, the total weight for observation j in class k is

    wj=wjjClass kwjpc,k.

    wj is the normalized weight for observation j; pc,k is the updated prior probability of class k (see previous bullet).

  • For two-class learning, fitcsvm assigns a box constraint to each observation in the training data. The formula for the box constraint of observation j is

    Cj=nC0wj.

    n is the training sample size, C0 is the initial box constraint (see BoxConstraint), and wj is the total weight of observation j (see previous bullet).

  • If you set 'Standardize',true and any of 'Cost', 'Prior', or 'Weights', then fitcsvm standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is, fitcsvm standardizes predictor j (xj) using

    xj=xjμjσj.

    • μj=1kwkkwkxjk.

    • xjk is observation k (row) of predictor j (column).

    • (σj)2=v1v12v2kwk(xjkμj)2.

    • v1=jwj.

    • v2=j(wj)2.

  • Let p be the proportion of outliers that you expect in the training data. If you set 'OutlierFraction',p, then:

    • For one-class learning, the software trains the bias term such that 100p% of the observations in the training data have negative scores.

    • The software implements robust learning for two-class learning. In other words, the software attempts to remove 100p% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

  • If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.

    • The PredictorNames property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then PredictorNames is a 1-by-3 cell array of character vectors containing the original names of the predictor variables.

    • The ExpandedPredictorNames property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then ExpandedPredictorNames is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables.

    • Similarly, the Beta property stores one beta coefficient for each predictor, including the dummy variables.

    • The SupportVectors property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. Then SupportVectors is an n-by-5 matrix.

    • The X property stores the training data as originally input. It does not include the dummy variables. When the input is a table, X contains only the columns used as predictors.

  • For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.

    • For a variable having k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is -1 for levels up to j, and +1 for levels j + 1 through k.

    • The names of the dummy variables stored in the ExpandedPredictorNames property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k.

  • All solvers implement L1 soft-margin minimization.

  • fitcsvm and svmtrain use, among other algorithms, SMO for optimization. The software implements SMO differently between the two functions, but numerical studies show that there is sensible agreement in the results.

  • For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that

    j=1nαj=nν.

References

[1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. "Working set selection using second order information for training support vector machines." Journal of Machine Learning Research, Vol 6, 2005, pp. 1889–1918.

[3] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[4] Kecman V., T. -M. Huang, and M. Vogt. "Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance." In Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.

[5] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. "Estimating the Support of a High-Dimensional Distribution." Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.

[6] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.

Introduced in R2014a

Was this topic helpful?