Documentation

# ClassificationSVM

Support vector machine (SVM) for one-class and binary classification

## Description

`ClassificationSVM` is a support vector machine (SVM) classifier for one-class and two-class learning. Trained `ClassificationSVM` classifiers store training data, parameter values, prior probabilities, support vectors, and algorithmic implementation information. Use these classifiers to perform tasks such as fitting a score-to-posterior-probability transformation function (see `fitPosterior`) and predicting labels for new data (see `predict`).

## Creation

Create a `ClassificationSVM` object by using `fitcsvm`.

## Properties

expand all

### SVM Properties

Trained classifier coefficients, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, `sum(Mdl.IsSupportVector)`.

`Alpha` contains the trained classifier coefficients from the dual problem, that is, the estimated Lagrange multipliers. If you remove duplicates by using the `RemoveDuplicates` name-value pair argument of `fitcsvm`, then for a given set of duplicate observations that are support vectors, `Alpha` contains one coefficient corresponding to the entire set. That is, MATLAB® attributes a nonzero coefficient to one observation from the set of duplicates and a coefficient of `0` to all other duplicate observations in the set.

Data Types: `single` | `double`

Linear predictor coefficients, specified as a numeric vector. The length of `Beta` is equal to the number of predictors used to train the model.

MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. `Beta` stores one value for each predictor variable, including the dummy variables. For example, if there are three predictors, one of which is a categorical variable with three levels, then `Beta` is a numeric vector containing five values.

If `KernelParameters.Function` is `'linear'`, then the classification score for the observation x is

`$f\left(x\right)=\left(x/s\right)\prime \beta +b.$`

`Mdl` stores β, b, and s in the properties `Beta`, `Bias`, and `KernelParameters.Scale`, respectively.

To estimate classification scores manually, you must first apply any transformations to the predictor data that were applied during training. Specifically, if you specify `'Standardize',true` when using `fitcsvm`, then you must standardize the predictor data manually by using the mean `Mdl.Mu` and standard deviation `Mdl.Sigma`, and then divide the result by the kernel scale in `Mdl.KernelParameters.Scale`.

All SVM functions, such as `resubPredict` and `predict`, apply any required transformation before estimation.

If `KernelParameters.Function` is not `'linear'`, then `Beta` is empty (`[]`).

Data Types: `single` | `double`

Bias term, specified as a scalar.

Data Types: `single` | `double`

Box constraints, specified as a numeric vector of n-by-1 box constraints. n is the number of observations in the training data (see the `NumObservations` property).

If you remove duplicates by using the `RemoveDuplicates` name-value pair argument of `fitcsvm`, then for a given set of duplicate observations, MATLAB sums the box constraints and then attributes the sum to one observation. MATLAB attributes the box constraints of `0` to all other observations in the set.

Data Types: `single` | `double`

Caching information, specified as a structure array. The caching information contains the fields described in this table.

FieldDescription
Size

The cache size (in MB) that the software reserves to train the SVM classifier. For details, see `'CacheSize'`.

Algorithm

The caching algorithm that the software uses during optimization. Currently, the only available caching algorithm is `Queue`. You cannot set the caching algorithm.

Display the fields of `CacheInfo` by using dot notation. For example, `Mdl.CacheInfo.Size` displays the value of the cache size.

Data Types: `struct`

Support vector indicator, specified as an n-by-1 logical vector that flags whether a corresponding observation in the predictor data matrix is a Support Vector. n is the number of observations in the training data (see NumObservations).

If you remove duplicates by using the `RemoveDuplicates` name-value pair argument of `fitcsvm`, then for a given set of duplicate observations that are support vectors, `IsSupportVector` flags only one observation as a support vector.

Data Types: `logical`

Kernel parameters, specified as a structure array. The kernel parameters property contains the fields listed in this table.

FieldDescription
Function

Kernel function used to compute the elements of the Gram matrix. For details, see `'KernelFunction'`.

Scale

Kernel scale parameter used to scale all elements of the predictor data on which the model is trained. For details, see `'KernelScale'`.

To display the values of `KernelParameters`, use dot notation. For example, `Mdl.KernelParameters.Scale` displays the kernel scale parameter value.

The software accepts `KernelParameters` as inputs and does not modify them.

Data Types: `struct`

One-class learning parameter ν, specified as a positive scalar.

Data Types: `single` | `double`

Proportion of outliers in the training data, specified as a numeric scalar.

Data Types: `double`

Optimization routine used to train the SVM classifier, specified as `'ISDA'`, `'L1QP'`, or `'SMO'`. For more details, see `'Solver'`.

Support vector class labels, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, `sum(Mdl.IsSupportVector)`.

A value of `+1` in `SupportVectorLabels` indicates that the corresponding support vector is in the positive class (`ClassNames{2}`). A value of `–1` indicates that the corresponding support vector is in the negative class (`ClassNames{1}`).

If you remove duplicates by using the `RemoveDuplicates` name-value pair argument of `fitcsvm`, then for a given set of duplicate observations that are support vectors, `SupportVectorLabels` contains one unique support vector label.

Data Types: `single` | `double`

Support vectors in the trained classifier, specified as an s-by-p numeric matrix. s is the number of support vectors in the trained classifier, `sum(Mdl.IsSupportVector)`, and p is the number of predictor variables in the predictor data.

`SupportVectors` contains rows of the predictor data `X` that MATLAB considers to be support vectors. If you specify `'Standardize',true` when training the SVM classifier using `fitcsvm`, then `SupportVectors` contains the standardized rows of `X`.

If you remove duplicates by using the `RemoveDuplicates` name-value pair argument of `fitcsvm`, then for a given set of duplicate observations that are support vectors, `SupportVectors` contains one unique support vector.

Data Types: `single` | `double`

### Other Classification Properties

Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty (`[]`).

Data Types: `single` | `double`

Unique class labels used in training the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors.

Data Types: `single` | `double` | `logical` | `char` | `cell` | `categorical`

Misclassification cost, specified as a numeric square matrix, where `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i`.

During training, the software updates the prior probabilities by incorporating the penalties described in the cost matrix.

• For two-class learning, `Cost` always has this form: `Cost(i,j) = 1` if `i ~= j`, and `Cost(i,j) = 0` if `i = j`. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`.

• For one-class learning, `Cost = 0`.

For more details, see Algorithms.

Data Types: `double`

Expanded predictor names, specified as a cell array of character vectors.

If the model uses dummy variable encoding for categorical variables, then `ExpandedPredictorNames` includes the names that describe the expanded variables. Otherwise, `ExpandedPredictorNames` is the same as `PredictorNames`.

Data Types: `cell`

Training data gradient values, specified as a numeric vector. The length of `Gradient` is equal to the number of observations (see NumObservations).

Data Types: `single` | `double`

Parameters used to train the `ClassificationSVM` model, specified as a structure array. `ModelParameters` contains parameter values such as the name-value pair argument values used to train the SVM classifier. `ModelParameters` does not contain estimated parameters.

Access the fields of `ModelParameters` by using dot notation. For example, access the initial values for estimating `Alpha` by using `Mdl.ModelParameters.Alpha`.

Data Types: `struct`

Predictor means, specified as a numeric vector. If you specify `'Standardize',1` or `'Standardize',true` when you train an SVM classifier using `fitcsvm`, then the length of `Mu` is equal to the number of predictors.

MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. `Mu` stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables.

If you set `'Standardize',false` when you train the SVM classifier using `fitcsvm`, then `Mu` is an empty vector (`[]`).

Data Types: `single` | `double`

Number of observations in the training data stored in X and Y, specified as a numeric scalar.

Data Types: `single` | `double`

Predictor variable names, specified as a cell array of character vectors. The order of the elements of `PredictorNames` corresponds to the order in which the predictor names appear in the training data.

Data Types: `cell`

Prior probabilities for each class, specified as a numeric vector. The order of the elements of `Prior` corresponds to the elements of `Mdl.ClassNames`.

For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix.

For more details, see Algorithms.

Data Types: `single` | `double`

Response variable name, specified as a character vector.

Data Types: `char`

Rows of the original data `X` used in fitting the `ClassificationSVM` model, specified as a logical vector. This property is empty if all rows are used.

Data Types: `logical`

Score transformation, specified as a character vector or function handle. `ScoreTransform` represents a built-in transformation function or a function handle for transforming predicted classification scores.

To change the score transformation function to `function`, for example, use dot notation.

• For a built-in function, enter a character vector.

`Mdl.ScoreTransform = 'function';`

This table describes the available built-in functions.

ValueDescription
`'doublelogit'`1/(1 + e–2x)
`'invlogit'`log(x / (1 – x))
`'ismax'`Sets the score for the class with the largest score to `1`, and sets the scores for all other classes to `0`
`'logit'`1/(1 + ex)
`'none'` or `'identity'`x (no transformation)
`'sign'`–1 for x < 0
0 for x = 0
1 for x > 0
`'symmetric'`2x – 1
`'symmetricismax'`Sets the score for the class with the largest score to `1`, and sets the scores for all other classes to `–1`
`'symmetriclogit'`2/(1 + ex) – 1

• For a MATLAB function or a function that you define, enter its function handle.

`Mdl.ScoreTransform = @function;`

`function` should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Data Types: `char` | `function_handle`

Predictor standard deviations, specified as a numeric vector.

If you specify `'Standardize',true` when you train the SVM classifier using `fitcsvm`, then the length of `Sigma` is equal to the number of predictor variables.

MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. `Sigma` stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables.

If you set `'Standardize',false` when you train the SVM classifier using `fitcsvm`, then `Sigma` is an empty vector (`[]`).

Data Types: `single` | `double`

Observation weights used to train the SVM classifier, specified as an n-by-1 numeric vector. n is the number of observations (see `NumObservations`).

`fitcsvm` normalizes the observation weights specified in the `'Weights'` name-value pair argument so that the elements of `W` within a particular class sum up to the prior probability of that class.

Data Types: `single` | `double`

Unstandardized predictors used to train the SVM classifier, specified as a numeric matrix or table.

Each row of `X` corresponds to one observation, and each column corresponds to one variable.

MATLAB excludes observations containing at least one missing value, and removes corresponding elements from `Y`.

Data Types: `single` | `double`

Class labels used to train the SVM classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. `Y` is the same data type as the input argument `Y` of `fitcsvm`. (The software treats string arrays as cell arrays of character vectors.)

Each row of `Y` represents the observed classification of the corresponding row of `X`.

MATLAB excludes elements containing missing values, and removes corresponding observations from `X`.

Data Types: `single` | `double` | `logical` | `char` | `cell` | `categorical`

### Convergence Control Properties

Convergence information, specified as a structure array.

FieldDescription
`Converged`Logical flag indicating whether the algorithm converged (`1` indicates convergence).
`ReasonForConvergence`Character vector indicating the criterion the software uses to detect convergence.
`Gap`Scalar feasibility gap between the dual and primal objective functions.
`GapTolerance`Scalar feasibility gap tolerance. Set this tolerance, for example to `1e-2`, by using the name-value pair argument `'GapTolerance',1e-2` of `fitcsvm`.
`DeltaGradient`Scalar-attained gradient difference between upper and lower violators
`DeltaGradientTolerance`Scalar tolerance for the gradient difference between upper and lower violators. Set this tolerance, for example to `1e-2`, by using the name-value pair argument `'DeltaGradientTolerance',1e-2` of `fitcsvm`.
`LargestKKTViolation`Maximal scalar Karush-Kuhn-Tucker (KKT) violation value.
`KKTTolerance`Scalar tolerance for the largest KKT violation. Set this tolerance, for example, to `1e-3`, by using the name-value pair argument `'KKTTolerance',1e-3` of `fitcsvm`.
`History`

Structure array containing convergence information at set optimization iterations. The fields are:

• `NumIterations`: numeric vector of iteration indices for which the software records convergence information

• `Gap`: numeric vector of `Gap` values at the iterations

• `DeltaGradient`: numeric vector of `DeltaGradient` values at the iterations

• `LargestKKTViolation`: numeric vector of `LargestKKTViolation` values at the iterations

• `NumSupportVectors`: numeric vector indicating the number of support vectors at the iterations

• `Objective`: numeric vector of `Objective` values at the iterations

`Objective`Scalar value of the dual objective function.

Data Types: `struct`

Number of iterations required by the optimization routine to attain convergence, specified as a positive integer.

To set the limit on the number of iterations to `1000`, for example, specify `'IterationLimit',1000` when you train the SVM classifier using `fitcsvm`.

Data Types: `double`

Number of iterations between reductions of the active set, specified as a nonnegative integer.

To set the shrinkage period to `1000`, for example, specify `'ShrinkagePeriod',1000` when you train the SVM classifier using `fitcsvm`.

Data Types: `single` | `double`

### Hyperparameter Optimization Properties

Description of the cross-validation optimization of hyperparameters, specified as a `BayesianOptimization` object or a table of hyperparameters and associated values. This property is nonempty when the `'OptimizeHyperparameters'` name-value pair argument of `fitcsvm` is nonempty at creation. The value of `HyperparameterOptimizationResults` depends on the setting of the `Optimizer` field in the `HyperparameterOptimizationOptions` structure of `fitcsvm` at creation, as described in this table.

Value of `Optimizer` FieldValue of `HyperparameterOptimizationResults`
`'bayesopt'` (default)Object of class `BayesianOptimization`
`'gridsearch'` or `'randomsearch'`Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

## Object Functions

 `compact` Reduce size of support vector machine (SVM) classifier `compareHoldout` Compare accuracies of two classification models using new data `crossval` Cross-validate support vector machine (SVM) classifier `discardSupportVectors` Discard support vectors for linear support vector machine (SVM) classifier `edge` Find classification edge for support vector machine (SVM) classifier `fitPosterior` Fit posterior probabilities for support vector machine (SVM) classifier `loss` Find classification error for support vector machine (SVM) classifier `margin` Find classification margins for support vector machine (SVM) classifier `predict` Classify observations using support vector machine (SVM) classifier `resubEdge` Find classification edge for support vector machine (SVM) classifier by resubstitution `resubLoss` Find classification loss for support vector machine (SVM) classifier by resubstitution `resubMargin` Find classification margins for support vector machine (SVM) classifier by resubstitution `resubPredict` Classify observations in support vector machine (SVM) classifier `resume` Resume training support vector machine (SVM) classifier

## Examples

collapse all

Load Fisher's iris data set. Remove the sepal lengths and widths and all observed setosa irises.

```load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds);```

Train an SVM classifier using the processed data set.

`SVMModel = fitcsvm(X,y)`
```SVMModel = ClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 100 Alpha: [24x1 double] Bias: -14.4149 KernelParameters: [1x1 struct] BoxConstraints: [100x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [100x1 logical] Solver: 'SMO' Properties, Methods ```

`SVMModel` is a trained `ClassificationSVM` classifier. Display the properties of `SVMModel`. For example, to determine the class order, use dot notation.

`classOrder = SVMModel.ClassNames`
```classOrder = 2x1 cell array {'versicolor'} {'virginica' } ```

The first class (`'versicolor'`) is the negative class, and the second (`'virginica'`) is the positive class. You can change the class order during training by using the `'ClassNames'` name-value pair argument.

Plot a scatter diagram of the data and circle the support vectors.

```sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off```

The support vectors are observations that occur on or beyond their estimated class boundaries.

You can adjust the boundaries (and, therefore, the number of support vectors) by setting a box constraint during training using the `'BoxConstraint'` name-value pair argument.

Load the `ionosphere` data set.

`load ionosphere`

Train and cross-validate an SVM classifier. Standardize the predictor data and specify the order of the classes.

```rng(1); % For reproducibility CVSVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'},'CrossVal','on')```
```CVSVMModel = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods ```

`CVSVMModel` is a `ClassificationPartitionedModel` cross-validated SVM classifier. By default, the software implements 10-fold cross-validation.

Alternatively, you can cross-validate a trained `ClassificationSVM` classifier by passing it to `crossval`.

Inspect one of the trained folds using dot notation.

`CVSVMModel.Trained{1}`
```ans = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods ```

Each fold is a `CompactClassificationSVM` classifier trained on 90% of the data.

Estimate the generalization error.

`genError = kfoldLoss(CVSVMModel)`
```genError = 0.1168 ```

On average, the generalization error is approximately 12%.

expand all

## Algorithms

• For the mathematical formulation of the SVM binary classification algorithm, see Support Vector Machines for Binary Classification and Understanding Support Vector Machines.

• `NaN`, `<undefined>`, empty character vector (`''`), empty string (`""`), and `<missing>` values indicate missing values. `fitcsvm` removes entire rows of data corresponding to a missing response. When computing total weights (see the next bullets), `fitcsvm` ignores any weight corresponding to an observation with at least one missing predictor. This action can lead to unbalanced prior probabilities in balanced-class problems. Consequently, observation box constraints might not equal `BoxConstraint`.

• `fitcsvm` removes observations that have zero weight or prior probability.

• For two-class learning, if you specify the cost matrix $\mathcal{C}$ (see `Cost`), then the software updates the class prior probabilities p (see `Prior`) to pc by incorporating the penalties described in $\mathcal{C}$.

Specifically, `fitcsvm` completes these steps:

1. Compute ${p}_{c}^{\ast }=p\prime \mathcal{C}.$

2. Normalize pc* so that the updated prior probabilities sum to 1.

`${p}_{c}=\frac{1}{\sum _{j=1}^{K}{p}_{c,j}^{\ast }}{p}_{c}^{\ast }.$`

K is the number of classes.

3. Reset the cost matrix to the default

`$\mathcal{C}=\left[\begin{array}{cc}0& 1\\ 1& 0\end{array}\right].$`

4. Remove observations from the training data corresponding to classes with zero prior probability.

• For two-class learning, `fitcsvm` normalizes all observation weights (see `Weights`) to sum to 1. The function then renormalizes the normalized weights to sum up to the updated prior probability of the class to which the observation belongs. That is, the total weight for observation j in class k is

wj is the normalized weight for observation j; pc,k is the updated prior probability of class k (see previous bullet).

• For two-class learning, `fitcsvm` assigns a box constraint to each observation in the training data. The formula for the box constraint of observation j is

`${C}_{j}=n{C}_{0}{w}_{j}^{\ast }.$`

n is the training sample size, C0 is the initial box constraint (see the `'BoxConstraint'` name-value pair argument), and ${w}_{j}^{\ast }$ is the total weight of observation j (see previous bullet).

• If you set `'Standardize',true` and the `'Cost'`, `'Prior'`, or `'Weights'` name-value pair argument, then `fitcsvm` standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is, `fitcsvm` standardizes predictor j (xj) using

`${x}_{j}^{\ast }=\frac{{x}_{j}-{\mu }_{j}^{\ast }}{{\sigma }_{j}^{\ast }}.$`

${\mu }_{j}^{\ast }=\frac{1}{\sum _{k}{w}_{k}^{\ast }}\sum _{k}{w}_{k}^{\ast }{x}_{jk}.$

xjk is observation k (row) of predictor j (column).

${\left({\sigma }_{j}^{\ast }\right)}^{2}=\frac{{v}_{1}}{{v}_{1}^{2}-{v}_{2}}\sum _{k}{w}_{k}^{\ast }{\left({x}_{jk}-{\mu }_{j}^{\ast }\right)}^{2}.$

${v}_{1}=\sum _{j}{w}_{j}^{\ast }.$

${v}_{2}=\sum _{j}{\left({w}_{j}^{\ast }\right)}^{2}.$

• Assume that `p` is the proportion of outliers that you expect in the training data, and that you set `'OutlierFraction',p`.

• For one-class learning, the software trains the bias term such that 100`p`% of the observations in the training data have negative scores.

• The software implements robust learning for two-class learning. In other words, the software attempts to remove 100`p`% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

• If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.

• The `PredictorNames` property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then `PredictorNames` is a 1-by-3 cell array of character vectors containing the original names of the predictor variables.

• The `ExpandedPredictorNames` property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then `ExpandedPredictorNames` is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables.

• Similarly, the `Beta` property stores one beta coefficient for each predictor, including the dummy variables.

• The `SupportVectors` property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. Then `SupportVectors` is an n-by-5 matrix.

• The `X` property stores the training data as originally input and does not include the dummy variables. When the input is a table, `X` contains only the columns used as predictors.

• For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.

• For a variable with k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is –1 for levels up to j, and +1 for levels j + 1 through k.

• The names of the dummy variables stored in the `ExpandedPredictorNames` property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k.

• All solvers implement L1 soft-margin minimization.

• For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that

`$\sum _{j=1}^{n}{\alpha }_{j}=n\nu .$`

## References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.

[3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

[4] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.