# fitrsvm

Fit a support vector machine regression model

## Syntax

``Mdl = fitrsvm(Tbl,ResponseVarName)``
``Mdl = fitrsvm(Tbl,formula)``
``Mdl = fitrsvm(Tbl,Y)``
``Mdl = fitrsvm(X,Y)``
``Mdl = fitrsvm(___,Name,Value)``

## Description

`fitrsvm` trains or cross-validates a support vector machine (SVM) regression model on a low- through moderate-dimensional predictor data set. `fitrsvm` supports mapping the predictor data using kernel functions, and supports SMO, ISDA, or L1 soft-margin minimization via quadratic programming for objective-function minimization.

To train a linear SVM regression model on a high-dimensional data set, that is, data sets that include many predictor variables, use `fitrlinear` instead.

To train an SVM model for binary classification, see `fitcsvm` for low- through moderate-dimensional predictor data sets, or `fitclinear` for high-dimensional data sets.

example

````Mdl = fitrsvm(Tbl,ResponseVarName)` returns a full, trained support vector machine (SVM) regression model `Mdl` trained using the predictors values in the table `Tbl` and the response values in `Tbl.ResponseVarName`.```
````Mdl = fitrsvm(Tbl,formula)` returns a full SVM regression model trained using the predictors values in the table `Tbl`. `formula` is an explanatory model of the response and a subset of predictor variables in `Tbl` used to fit `Mdl`.```
````Mdl = fitrsvm(Tbl,Y)` returns a full, trained SVM regression model trained using the predictors values in the table `Tbl` and the response values in the vector `Y`.```
````Mdl = fitrsvm(X,Y)` returns a full, trained SVM regression model trained using the predictors values in the matrix `X` and the response values in the vector `Y`.```

example

````Mdl = fitrsvm(___,Name,Value)` returns an SVM regression model with additional options specified by one or more name-value pair arguments, using any of the previous syntaxes. For example, you can specify the kernel function or train a cross-validated model.```

## Examples

collapse all

Train a support vector machine (SVM) regression model using sample data stored in matrices.

Load the `carsmall` data set.

```load carsmall rng 'default' % For reproducibility```

Specify `Horsepower` and `Weight` as the predictor variables (`X`) and `MPG` as the response variable (`Y`).

```X = [Horsepower,Weight]; Y = MPG;```

Train a default SVM regression model.

`Mdl = fitrsvm(X,Y)`
```Mdl = RegressionSVM ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' Alpha: [75x1 double] Bias: 43.2943 KernelParameters: [1x1 struct] NumObservations: 93 BoxConstraints: [93x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [93x1 logical] Solver: 'SMO' Properties, Methods ```

`Mdl` is a trained `RegressionSVM` model.

Check the model for convergence.

`Mdl.ConvergenceInfo.Converged`
```ans = logical 0 ```

`0` indicates that the model did not converge.

Retrain the model using standardized data.

`MdlStd = fitrsvm(X,Y,'Standardize',true)`
```MdlStd = RegressionSVM ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' Alpha: [77x1 double] Bias: 22.9131 KernelParameters: [1x1 struct] Mu: [109.3441 2.9625e+03] Sigma: [45.3545 805.9668] NumObservations: 93 BoxConstraints: [93x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [93x1 logical] Solver: 'SMO' Properties, Methods ```

Check the model for convergence.

`MdlStd.ConvergenceInfo.Converged`
```ans = logical 1 ```

`1` indicates that the model did converge.

Compute the resubstitution (in-sample) mean-squared error for the new model.

`lStd = resubLoss(MdlStd)`
```lStd = 17.0256 ```

Train a support vector machine regression model using the abalone data from the UCI Machine Learning Repository.

Download the data and save it in your current folder with the name `'abalone.csv'`.

```url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'; websave('abalone.csv',url); ```

Read the data into a table. Specify the variable names.

```varnames = {'Sex'; 'Length'; 'Diameter'; 'Height'; 'Whole_weight';... 'Shucked_weight'; 'Viscera_weight'; 'Shell_weight'; 'Rings'}; Tbl = readtable('abalone.csv','Filetype','text','ReadVariableNames',false); Tbl.Properties.VariableNames = varnames; ```

The sample data contains 4177 observations. All the predictor variables are continuous except for `Sex`, which is a categorical variable with possible values `'M'` (for males), `'F'` (for females), and `'I'` (for infants). The goal is to predict the number of rings (stored in `Rings`) on the abalone and determine its age using physical measurements.

Train an SVM regression model, using a Gaussian kernel function with an automatic kernel scale. Standardize the data.

```rng default % For reproducibility Mdl = fitrsvm(Tbl,'Rings','KernelFunction','gaussian','KernelScale','auto',... 'Standardize',true) ```
```Mdl = RegressionSVM PredictorNames: {1×8 cell} ResponseName: 'Rings' CategoricalPredictors: 1 ResponseTransform: 'none' Alpha: [3635×1 double] Bias: 10.8144 KernelParameters: [1×1 struct] Mu: [1×10 double] Sigma: [1×10 double] NumObservations: 4177 BoxConstraints: [4177×1 double] ConvergenceInfo: [1×1 struct] IsSupportVector: [4177×1 logical] Solver: 'SMO' ```

The Command Window shows that `Mdl` is a trained `RegressionSVM` model and displays a property list.

Display the properties of `Mdl` using dot notation. For example, check to confirm whether the model converged and how many iterations it completed.

```conv = Mdl.ConvergenceInfo.Converged iter = Mdl.NumIterations ```
```conv = logical 1 iter = 2759 ```

The returned results indicate that the model converged after 2759 iterations.

Load the `carsmall` data set.

```load carsmall rng 'default' % For reproducibility```

Specify `Horsepower` and `Weight` as the predictor variables (`X`) and `MPG` as the response variable (`Y`).

```X = [Horsepower Weight]; Y = MPG;```

Cross-validate two SVM regression models using 5-fold cross-validation. For both models, specify to standardize the predictors. For one of the models, specify to train using the default linear kernel, and the Gaussian kernel for the other model.

`MdlLin = fitrsvm(X,Y,'Standardize',true,'KFold',5)`
```MdlLin = classreg.learning.partition.RegressionPartitionedSVM CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2'} ResponseName: 'Y' NumObservations: 94 KFold: 5 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods ```
`MdlGau = fitrsvm(X,Y,'Standardize',true,'KFold',5,'KernelFunction','gaussian')`
```MdlGau = classreg.learning.partition.RegressionPartitionedSVM CrossValidatedModel: 'SVM' PredictorNames: {'x1' 'x2'} ResponseName: 'Y' NumObservations: 94 KFold: 5 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods ```
`MdlLin.Trained`
```ans=5×1 cell array {1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM} ```

`MdlLin` and `MdlGau` are `RegressionPartitionedSVM` cross-validated models. The `Trained` property of each model is a 5-by-1 cell array of `CompactRegressionSVM` models. The models in the cell store the results of training on 4 folds of observations, and leaving one fold of observations out.

Compare the generalization error of the models. In this case, the generalization error is the out-of-sample mean-squared error.

`mseLin = kfoldLoss(MdlLin)`
```mseLin = 17.4417 ```
`mseGau = kfoldLoss(MdlGau)`
```mseGau = 16.7397 ```

The SVM regression model using the Gaussian kernel performs better than the one using the linear kernel.

Create a model suitable for making predictions by passing the entire data set to `fitrsvm`, and specify all name-value pair arguments that yielded the better-performing model. However, do not specify any cross-validation options.

`MdlGau = fitrsvm(X,Y,'Standardize',true,'KernelFunction','gaussian');`

To predict the MPG of a set of cars, pass `Mdl` and a table containing the horsepower and weight measurements of the cars to `predict`.

This example shows how to optimize hyperparameters automatically using `fitrsvm`. The example uses the `carsmall` data.

Load the `carsmall` data set.

`load carsmall`

Specify `Horsepower` and `Weight` as the predictor variables (`X`) and `MPG` as the response variable (`Y`).

```X = [Horsepower Weight]; Y = MPG;```

Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization.

For reproducibility, set the random seed and use the `'expected-improvement-plus'` acquisition function.

```rng default Mdl = fitrsvm(X,Y,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',... 'expected-improvement-plus'))```
```|====================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelScale | Epsilon | | | result | log(1+loss) | runtime | (observed) | (estim.) | | | | |====================================================================================================================| | 1 | Best | 6.1124 | 10.173 | 6.1124 | 6.1124 | 0.35664 | 0.043031 | 0.30396 | ```
```| 2 | Best | 2.9114 | 0.085431 | 2.9114 | 3.088 | 70.67 | 710.65 | 1.6369 | ```
```| 3 | Accept | 4.1884 | 0.055579 | 2.9114 | 3.078 | 14.367 | 0.0059144 | 442.64 | ```
```| 4 | Accept | 4.159 | 0.047438 | 2.9114 | 3.0457 | 0.0030879 | 715.31 | 2.6045 | ```
```| 5 | Best | 2.9044 | 0.17255 | 2.9044 | 2.9042 | 906.95 | 761.46 | 1.3274 | ```
```| 6 | Best | 2.8666 | 0.50424 | 2.8666 | 2.8668 | 997.3 | 317.41 | 3.7696 | ```
```| 7 | Accept | 4.1881 | 0.04287 | 2.8666 | 2.8669 | 759.56 | 987.74 | 15.074 | ```
```| 8 | Accept | 2.8992 | 2.4538 | 2.8666 | 2.8669 | 819.07 | 152.11 | 1.5192 | ```
```| 9 | Accept | 2.8916 | 0.14372 | 2.8666 | 2.8672 | 921.52 | 627.48 | 2.3029 | ```
```| 10 | Accept | 2.9001 | 0.28839 | 2.8666 | 2.8676 | 382.91 | 343.04 | 1.5448 | ```
```| 11 | Accept | 3.6573 | 9.616 | 2.8666 | 2.8784 | 945.1 | 8.885 | 3.9207 | ```
```| 12 | Accept | 2.9381 | 0.10258 | 2.8666 | 2.871 | 935.49 | 979.29 | 0.1384 | ```
```| 13 | Accept | 2.9341 | 0.043204 | 2.8666 | 2.8719 | 1.992 | 999.49 | 0.21557 | ```
```| 14 | Accept | 2.9227 | 0.044301 | 2.8666 | 2.8742 | 2.351 | 977.85 | 0.026124 | ```
```| 15 | Accept | 2.9483 | 0.12695 | 2.8666 | 2.8751 | 826.92 | 713.57 | 0.0096305 | ```
```| 16 | Accept | 2.9502 | 1.1304 | 2.8666 | 2.8813 | 345.64 | 129.6 | 0.027832 | ```
```| 17 | Accept | 2.9329 | 0.081387 | 2.8666 | 2.8799 | 836.96 | 970.73 | 0.034398 | ```
```| 18 | Accept | 2.9177 | 0.04481 | 2.8666 | 2.8771 | 0.10167 | 129.91 | 0.0092675 | ```
```| 19 | Accept | 2.95 | 2.4018 | 2.8666 | 2.8749 | 199.85 | 68.93 | 0.0092982 | ```
```| 20 | Accept | 4.1964 | 0.043585 | 2.8666 | 2.8685 | 0.0012054 | 940.94 | 0.0097673 | ```
```|====================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelScale | Epsilon | | | result | log(1+loss) | runtime | (observed) | (estim.) | | | | |====================================================================================================================| | 21 | Accept | 2.905 | 0.051636 | 2.8666 | 2.8675 | 5.9475 | 199.82 | 0.013585 | ```
```| 22 | Accept | 2.9329 | 0.078409 | 2.8666 | 2.8747 | 0.33221 | 21.509 | 0.0094248 | ```
```| 23 | Accept | 2.9017 | 0.061248 | 2.8666 | 2.8689 | 13.341 | 554.39 | 0.069216 | ```
```| 24 | Accept | 2.9067 | 0.046904 | 2.8666 | 2.8694 | 0.21467 | 73.415 | 0.028231 | ```
```| 25 | Accept | 2.9046 | 0.070385 | 2.8666 | 2.8731 | 0.68546 | 61.287 | 0.0099165 | ```
```| 26 | Accept | 2.9138 | 0.044185 | 2.8666 | 2.8676 | 0.0012185 | 8.8743 | 0.0093263 | ```
```| 27 | Accept | 2.9193 | 0.045608 | 2.8666 | 2.8731 | 0.0099434 | 30.484 | 0.0093546 | ```
```| 28 | Accept | 8.5384 | 9.9821 | 2.8666 | 2.8683 | 992.36 | 1.4043 | 0.0093129 | ```
```| 29 | Accept | 3.2254 | 0.043063 | 2.8666 | 2.8682 | 0.0010092 | 16.917 | 7.3665 | ```
```| 30 | Accept | 4.1884 | 0.042358 | 2.8666 | 2.8683 | 983.95 | 42.654 | 287.19 | ```

```__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 67.1005 seconds. Total objective function evaluation time: 38.0678 Best observed feasible point: BoxConstraint KernelScale Epsilon _____________ ___________ _______ 997.3 317.41 3.7696 Observed objective function value = 2.8666 Estimated objective function value = 2.8683 Function evaluation time = 0.50424 Best estimated feasible point (according to models): BoxConstraint KernelScale Epsilon _____________ ___________ _______ 997.3 317.41 3.7696 Estimated objective function value = 2.8683 Estimated function evaluation time = 0.44423 ```
```Mdl = RegressionSVM ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' Alpha: [35×1 double] Bias: 48.8155 KernelParameters: [1×1 struct] NumObservations: 93 HyperparameterOptimizationResults: [1×1 BayesianOptimization] BoxConstraints: [93×1 double] ConvergenceInfo: [1×1 struct] IsSupportVector: [93×1 logical] Solver: 'SMO' Properties, Methods ```

The optimization searched over `BoxConstraint`, `KernelScale`, and `Epsilon`. The output is the regression with the minimum estimated cross-validation loss.

## Input Arguments

collapse all

Sample data used to train the model, specified as a table. Each row of `Tbl` corresponds to one observation, and each column corresponds to one predictor variable. Optionally, `Tbl` can contain one additional column for the response variable. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed.

If `Tbl` contains the response variable, and you want to use all remaining variables in `Tbl` as predictors, then specify the response variable using `ResponseVarName`.

If `Tbl` contains the response variable, and you want to use only a subset of the remaining variables in `Tbl` as predictors, then specify a formula using `formula`.

If `Tbl` does not contain the response variable, then specify a response variable using `Y`. The length of response variable and the number of rows of `Tbl` must be equal.

If a row of `Tbl` or an element of `Y` contains at least one `NaN`, then `fitrsvm` removes those rows and elements from both arguments when training the model.

To specify the names of the predictors in the order of their appearance in `Tbl`, use the `PredictorNames` name-value pair argument.

Data Types: `table`

Response variable name, specified as the name of a variable in `Tbl`. The response variable must be a numeric vector.

You must specify `ResponseVarName` as a character vector or string scalar. For example, if `Tbl` stores the response variable `Y` as `Tbl.Y`, then specify it as `'Y'`. Otherwise, the software treats all columns of `Tbl`, including `Y`, as predictors when training the model.

Data Types: `char` | `string`

Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form `'Y~X1+X2+X3'`. In this form, `Y` represents the response variable, and `X1`, `X2`, and `X3` represent the predictor variables.

To specify a subset of variables in `Tbl` as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in `Tbl` that do not appear in `formula`.

The variable names in the formula must be both variable names in `Tbl` (`Tbl.Properties.VariableNames`) and valid MATLAB® identifiers.

You can verify the variable names in `Tbl` by using the `isvarname` function. The following code returns logical `1` (`true`) for each variable that has a valid variable name.

`cellfun(@isvarname,Tbl.Properties.VariableNames)`
If the variable names in `Tbl` are not valid, then convert them by using the `matlab.lang.makeValidName` function.
`Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);`

Data Types: `char` | `string`

Response data, specified as an n-by-1 numeric vector. The length of `Y` and the number of rows of `Tbl` or `X` must be equal.

If a row of `Tbl` or `X`, or an element of `Y`, contains at least one `NaN`, then `fitrsvm` removes those rows and elements from both arguments when training the model.

To specify the response variable name, use the `ResponseName` name-value pair argument.

Data Types: `single` | `double`

Predictor data to which the SVM regression model is fit, specified as an n-by-p numeric matrix. n is the number of observations and p is the number of predictor variables.

The length of `Y` and the number of rows of `X` must be equal.

If a row of `X` or an element of `Y` contains at least one `NaN`, then `fitrsvm` removes those rows and elements from both arguments.

To specify the names of the predictors in the order of their appearance in `X`, use the `PredictorNames` name-value pair argument.

Data Types: `single` | `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'KernelFunction','gaussian','Standardize',true,'CrossVal','on'` trains a 10-fold cross-validated SVM regression model using a Gaussian kernel and standardized training data.

### Note

You cannot use any cross-validation name-value pair argument along with the `'OptimizeHyperparameters'` name-value pair argument. You can modify the cross-validation for `'OptimizeHyperparameters'` only by using the `'HyperparameterOptimizationOptions'` name-value pair argument.

#### Support Vector Machine Options

collapse all

Box constraint for the alpha coefficients, specified as the comma-separated pair consisting of `'BoxConstraint'` and a positive scalar value.

The absolute value of the `Alpha` coefficients cannot exceed the value of `BoxConstraint`.

The default `BoxConstraint` value for the `'gaussian'` or `'rbf'` kernel function is `iqr(Y)/1.349`, where `iqr(Y)` is the interquartile range of response variable `Y`. For all other kernels, the default `BoxConstraint` value is 1.

Example: `BoxConstraint,10`

Data Types: `single` | `double`

Kernel function used to compute the Gram matrix, specified as the comma-separated pair consisting of `'KernelFunction'` and a value in this table.

ValueDescriptionFormula
`'gaussian'` or `'rbf'`Gaussian or Radial Basis Function (RBF) kernel

`$G\left({x}_{j},{x}_{k}\right)=\mathrm{exp}\left(-{‖{x}_{j}-{x}_{k}‖}^{2}\right)$`

`'linear'`Linear kernel

`$G\left({x}_{j},{x}_{k}\right)={x}_{j}\prime {x}_{k}$`

`'polynomial'`Polynomial kernel. Use `'PolynomialOrder',p` to specify a polynomial kernel of order `p`.

`$G\left({x}_{j},{x}_{k}\right)={\left(1+{x}_{j}\prime {x}_{k}\right)}^{q}$`

You can set your own kernel function, for example, `kernel`, by setting `'KernelFunction','kernel'`. `kernel` must have the following form:

`function G = kernel(U,V)`
where:

• `U` is an m-by-p matrix.

• `V` is an n-by-p matrix.

• `G` is an m-by-n Gram matrix of the rows of `U` and `V`.

And `kernel.m` must be on the MATLAB path.

It is good practice to avoid using generic names for kernel functions. For example, call a sigmoid kernel function `'mysigmoid'` rather than `'sigmoid'`.

Example: `'KernelFunction','gaussian'`

Data Types: `char` | `string`

Kernel scale parameter, specified as the comma-separated pair consisting of `'KernelScale'` and `'auto'` or a positive scalar. The software divides all elements of the predictor matrix `X` by the value of `KernelScale`. Then, the software applies the appropriate kernel norm to compute the Gram matrix.

• If you specify `'auto'`, then the software selects an appropriate scale factor using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. Therefore, to reproduce results, set a random number seed using `rng` before training.

• If you specify `KernelScale` and your own kernel function, for example, `'KernelFunction','kernel'`, then the software throws an error. You must apply scaling within `kernel`.

Example: `'KernelScale','auto'`

Data Types: `double` | `single` | `char` | `string`

Polynomial kernel function order, specified as the comma-separated pair consisting of `'PolynomialOrder'` and a positive integer.

If you set `'PolynomialOrder'` and `KernelFunction` is not `'polynomial'`, then the software throws an error.

Example: `'PolynomialOrder',2`

Data Types: `double` | `single`

Kernel offset parameter, specified as the comma-separated pair consisting of `'KernelOffset'` and a nonnegative scalar.

The software adds `KernelOffset` to each element of the Gram matrix.

The defaults are:

• `0` if the solver is SMO (that is, you set `'Solver','SMO'`)

• `0.1` if the solver is ISDA (that is, you set `'Solver','ISDA'`)

Example: `'KernelOffset',0`

Data Types: `double` | `single`

Half the width of the epsilon-insensitive band, specified as the comma-separated pair consisting of `'Epsilon'` and a nonnegative scalar value.

The default `Epsilon` value is `iqr(Y)/13.49`, which is an estimate of a tenth of the standard deviation using the interquartile range of the response variable `Y`. If `iqr(Y)` is equal to zero, then the default `Epsilon` value is 0.1.

Example: `'Epsilon',0.3`

Data Types: `single` | `double`

Flag to standardize the predictor data, specified as the comma-separated pair consisting of `'Standardize'` and `true` (`1`) or `false` `(0)`.

If you set `'Standardize',true`:

• The software centers and scales each column of the predictor data (`X`) by the weighted column mean and standard deviation, respectively (for details on weighted standardizing, see Algorithms). MATLAB does not standardize the data contained in the dummy variable columns generated for categorical predictors.

• The software trains the model using the standardized predictor matrix, but stores the unstandardized data in the model property `X`.

Example: `'Standardize',true`

Data Types: `logical`

Optimization routine, specified as the comma-separated pair consisting of `'Solver'` and a value in this table.

ValueDescription
`'ISDA'`Iterative Single Data Algorithm (see [30])
`'L1QP'`Uses `quadprog` to implement L1 soft-margin minimization by quadratic programming. This option requires an Optimization Toolbox™ license. For more details, see Quadratic Programming Definition (Optimization Toolbox).
`'SMO'`Sequential Minimal Optimization (see [17])

The defaults are:

• `'ISDA'` if you set `'OutlierFraction'` to a positive value

• `'SMO'` otherwise

Example: `'Solver','ISDA'`

Initial estimates of alpha coefficients, specified as the comma-separated pair consisting of `'Alpha'` and a numeric vector. The length of `Alpha` must be equal to the number of rows of `X`.

• Each element of `Alpha` corresponds to an observation in `X`.

• `Alpha` cannot contain any `NaN`s.

• If you specify `Alpha` and any one of the cross-validation name-value pair arguments (`'CrossVal'`, `'CVPartition'`, `'Holdout'`, `'KFold'`, or `'Leaveout'`), then the software returns an error.

If `Y` contains any missing values, then remove all rows of `Y`, `X`, and `Alpha` that correspond to the missing values. That is, enter:

```idx = ~isnan(Y); Y = Y(idx); X = X(idx,:); alpha = alpha(idx);```
Then, pass `Y`, `X`, and `alpha` as the response, predictors, and initial alpha estimates, respectively.

The default is `zeros(size(Y,1))`.

Example: `'Alpha',0.1*ones(size(X,1),1)`

Data Types: `single` | `double`

Cache size, specified as the comma-separated pair consisting of `'CacheSize'` and `'maximal'` or a positive scalar.

If `CacheSize` is `'maximal'`, then the software reserves enough memory to hold the entire n-by-n Gram matrix.

If `CacheSize` is a positive scalar, then the software reserves `CacheSize` megabytes of memory for training the model.

Example: `'CacheSize','maximal'`

Data Types: `double` | `single` | `char` | `string`

Flag to clip alpha coefficients, specified as the comma-separated pair consisting of `'ClipAlphas'` and either `true` or `false`.

Suppose that the alpha coefficient for observation j is αj and the box constraint of observation j is Cj, j = 1,...,n, where n is the training sample size.

ValueDescription
`true`At each iteration, if αj is near 0 or near Cj, then MATLAB sets αj to 0 or to Cj, respectively.
`false`MATLAB does not change the alpha coefficients during optimization.

MATLAB stores the final values of α in the `Alpha` property of the trained SVM model object.

`ClipAlphas` can affect SMO and ISDA convergence.

Example: `'ClipAlphas',false`

Data Types: `logical`

Number of iterations between optimization diagnostic message output, specified as the comma-separated pair consisting of `'NumPrint'` and a nonnegative integer.

If you specify `'Verbose',1` and `'NumPrint',numprint`, then the software displays all optimization diagnostic messages from SMO and ISDA every `numprint` iterations in the Command Window.

Example: `'NumPrint',500`

Data Types: `double` | `single`

Expected proportion of outliers in training data, specified as the comma-separated pair consisting of `'OutlierFraction'` and a numeric scalar in the interval [0,1). `fitrsvm` removes observations with large gradients, ensuring that `fitrsvm` removes the fraction of observations specified by `OutlierFraction` by the time convergence is reached. This name-value pair is only valid when `'Solver'` is `'ISDA'`.

Example: `'OutlierFraction',0.1`

Data Types: `single` | `double`

Flag to replace duplicate observations with single observations in the training data, specified as the comma-separated pair consisting of `'RemoveDuplicates'` and `true` or `false`.

If `RemoveDuplicates` is `true`, then `fitrsvm` replaces duplicate observations in the training data with a single observation of the same value. The weight of the single observation is equal to the sum of the weights of the corresponding removed duplicates (see `Weights`).

### Tip

If your data set contains many duplicate observations, then specifying `'RemoveDuplicates',true` can decrease convergence time considerably.

Data Types: `logical`

Verbosity level, specified as the comma-separated pair consisting of `'Verbose'` and `0`, `1`, or `2`. The value of `Verbose` controls the amount of optimization information that the software displays in the Command Window and saves the information as a structure to `Mdl.ConvergenceInfo.History`.

This table summarizes the available verbosity level options.

ValueDescription
`0`The software does not display or save convergence information.
`1`The software displays diagnostic messages and saves convergence criteria every `numprint` iterations, where `numprint` is the value of the name-value pair argument `'NumPrint'`.
`2`The software displays diagnostic messages and saves convergence criteria at every iteration.

Example: `'Verbose',1`

Data Types: `double` | `single`

#### Other Regression Options

collapse all

Categorical predictors list, specified as the comma-separated pair consisting of `'CategoricalPredictors'` and one of the values in this table.

ValueDescription
Vector of positive integersEach entry in the vector is an index value corresponding to the column of the predictor data (`X` or `Tbl`) that contains a categorical variable.
Logical vectorA `true` entry means that the corresponding column of predictor data (`X` or `Tbl`) is a categorical variable.
Character matrixEach row of the matrix is the name of a predictor variable. The names must match the entries in `PredictorNames`. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable. The names must match the entries in `PredictorNames`.
`'all'`All predictors are categorical.

By default, if the predictor data is in a table (`Tbl`), `fitrsvm` assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (`X`), `fitrsvm` assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the `'CategoricalPredictors'` name-value pair argument.

For the identified categorical predictors, `fitrsvm` creates dummy variables using two different schemes, depending on whether a categorical variable is unordered or ordered. For details, see Automatic Creation of Dummy Variables.

Example: `'CategoricalPredictors','all'`

Data Types: `single` | `double` | `logical` | `char` | `string` | `cell`

Predictor variable names, specified as the comma-separated pair consisting of `'PredictorNames'` and a string array of unique names or cell array of unique character vectors. The functionality of `'PredictorNames'` depends on the way you supply the training data.

• If you supply `X` and `Y`, then you can use `'PredictorNames'` to give the predictor variables in `X` names.

• The order of the names in `PredictorNames` must correspond to the column order of `X`. That is, `PredictorNames{1}` is the name of `X(:,1)`, `PredictorNames{2}` is the name of `X(:,2)`, and so on. Also, `size(X,2)` and `numel(PredictorNames)` must be equal.

• By default, `PredictorNames` is `{'x1','x2',...}`.

• If you supply `Tbl`, then you can use `'PredictorNames'` to choose which predictor variables to use in training. That is, `fitrsvm` uses only the predictor variables in `PredictorNames` and the response variable in training.

• `PredictorNames` must be a subset of `Tbl.Properties.VariableNames` and cannot include the name of the response variable.

• By default, `PredictorNames` contains the names of all predictor variables.

• A good practice is to specify the predictors for training using either `'PredictorNames'` or `formula` only.

Example: `'PredictorNames',{'SepalLength','SepalWidth','PetalLength','PetalWidth'}`

Data Types: `string` | `cell`

Response variable name, specified as the comma-separated pair consisting of `'ResponseName'` and a character vector or string scalar.

Example: `'ResponseName','response'`

Data Types: `char` | `string`

Response transformation, specified as the comma-separated pair consisting of `'ResponseTransform'` and either `'none'` or a function handle. The default is `'none'`, which means `@(y)y`, or no transformation. For a MATLAB function or a function you define, use its function handle. The function handle must accept a vector (the original response values) and return a vector of the same size (the transformed response values).

Example: Suppose you create a function handle that applies an exponential transformation to an input vector by using `myfunction = @(y)exp(y)`. Then, you can specify the response transformation as `'ResponseTransform',myfunction`.

Data Types: `char` | `string` | `function_handle`

Observation weights, specified as the comma-separated pair consisting of `'Weights'` and a vector of numeric values. The size of `Weights` must equal the number of rows in `X`. `fitrsvm` normalizes the values of `Weights` to sum to 1.

Data Types: `single` | `double`

#### Cross-Validation Options

collapse all

Cross-validation flag, specified as the comma-separated pair consisting of `'CrossVal'` and either `'on'` or `'off'`.

If you specify `'on'`, then the software implements 10-fold cross-validation.

To override this cross-validation setting, use one of these name-value pair arguments: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only.

Alternatively, you can cross-validate the model later using the `crossval` method.

Example: `'CrossVal','on'`

Cross-validation partition, specified as the comma-separated pair consisting of `'CVPartition'` and a `cvpartition` partition object created by `cvpartition`. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can use one of these four name-value pair arguments only: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using `cvp = cvpartition(500,'KFold',5)`. Then, you can specify the cross-validated model by using `'CVPartition',cvp`.

Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of `'Holdout'` and a scalar value in the range (0,1). If you specify `'Holdout',p`, then the software completes these steps:

1. Randomly select and reserve `p*100`% of the data as validation data, and train the model using the rest of the data.

2. Store the compact, trained model in the `Trained` property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: `'Holdout',0.1`

Data Types: `double` | `single`

Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of `'KFold'` and a positive integer value greater than 1. If you specify `'KFold',k`, then the software completes these steps:

1. Randomly partition the data into `k` sets.

2. For each set, reserve the set as validation data, and train the model using the other `k` – 1 sets.

3. Store the `k` compact, trained models in the cells of a `k`-by-1 cell vector in the `Trained` property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: `'KFold',5`

Data Types: `single` | `double`

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of `'Leaveout'` and `'on'` or `'off'`. If you specify `'Leaveout','on'`, then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the `NumObservations` property of the model), the software completes these steps:

1. Reserve the observation as validation data, and train the model using the other n – 1 observations.

2. Store the n compact, trained models in the cells of an n-by-1 cell vector in the `Trained` property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: `CVPartition`, `Holdout`, `KFold`, or `Leaveout`.

Example: `'Leaveout','on'`

#### Convergence Controls

collapse all

Tolerance for gradient difference between upper and lower violators obtained by SMO or ISDA, specified as the comma-separated pair consisting of `'DeltaGradientTolerance'` and a nonnegative scalar.

Example: `'DeltaGradientTolerance',1e-4`

Data Types: `single` | `double`

Feasibility gap tolerance obtained by SMO or ISDA, specified as the comma-separated pair consisting of `'GapTolerance'` and a nonnegative scalar.

If `GapTolerance` is `0`, then `fitrsvm` does not use this parameter to check convergence.

Example: `'GapTolerance',1e-4`

Data Types: `single` | `double`

Maximal number of numerical optimization iterations, specified as the comma-separated pair consisting of `'IterationLimit'` and a positive integer.

The software returns a trained model regardless of whether the optimization routine successfully converges. `Mdl.ConvergenceInfo` contains convergence information.

Example: `'IterationLimit',1e8`

Data Types: `double` | `single`

Tolerance for Karush-Kuhn-Tucker (KKT) violation, specified as the comma-separated pair consisting of `'KKTTolerance'` and a nonnegative scalar value.

This name-value pair applies only if `'Solver'` is `'SMO'` or `'ISDA'`.

If `KKTTolerance` is `0`, then `fitrsvm` does not use this parameter to check convergence.

Example: `'KKTTolerance',1e-4`

Data Types: `single` | `double`

Number of iterations between reductions of the active set, specified as the comma-separated pair consisting of `'ShrinkagePeriod'` and a nonnegative integer.

If you set `'ShrinkagePeriod',0`, then the software does not shrink the active set.

Example: `'ShrinkagePeriod',1000`

Data Types: `double` | `single`

#### Hyperparameter Optimization

collapse all

Parameters to optimize, specified as the comma-separated pair consisting of `'OptimizeHyperparameters'` and one of the following:

• `'none'` — Do not optimize.

• `'auto'` — Use `{'BoxConstraint','KernelScale','Epsilon'}`.

• `'all'` — Optimize all eligible parameters.

• String array or cell array of eligible parameter names.

• Vector of `optimizableVariable` objects, typically the output of `hyperparameters`.

The optimization attempts to minimize the cross-validation loss (error) for `fitrsvm` by varying the parameters. To control the cross-validation type and other aspects of the optimization, use the `HyperparameterOptimizationOptions` name-value pair.

### Note

`'OptimizeHyperparameters'` values override any values you set using other name-value pair arguments. For example, setting `'OptimizeHyperparameters'` to `'auto'` causes the `'auto'` values to apply.

The eligible parameters for `fitrsvm` are:

• `BoxConstraint``fitrsvm` searches among positive values, by default log-scaled in the range `[1e-3,1e3]`.

• `KernelScale``fitrsvm` searches among positive values, by default log-scaled in the range `[1e-3,1e3]`.

• `Epsilon``fitrsvm` searches among positive values, by default log-scaled in the range `[1e-3,1e2]*iqr(Y)/1.349`.

• `KernelFunction``fitrsvm` searches among `'gaussian'`, `'linear'`, and `'polynomial'`.

• `PolynomialOrder``fitrsvm` searches among integers in the range `[2,4]`.

• `Standardize``fitrsvm` searches among `'true'` and `'false'`.

Set nondefault parameters by passing a vector of `optimizableVariable` objects that have nondefault values. For example,

```load carsmall params = hyperparameters('fitrsvm',[Horsepower,Weight],MPG); params(1).Range = [1e-4,1e6];```

Pass `params` as the value of `OptimizeHyperparameters`.

By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the `Verbose` field of the `'HyperparameterOptimizationOptions'` name-value pair argument. To control the plots, set the `ShowPlots` field of the `'HyperparameterOptimizationOptions'` name-value pair argument.

For an example, see Optimize SVM Regression.

Example: `'OptimizeHyperparameters','auto'`

Options for optimization, specified as the comma-separated pair consisting of `'HyperparameterOptimizationOptions'` and a structure. This argument modifies the effect of the `OptimizeHyperparameters` name-value pair argument. All fields in the structure are optional.

Field NameValuesDefault
`Optimizer`
• `'bayesopt'` — Use Bayesian optimization. Internally, this setting calls `bayesopt`.

• `'gridsearch'` — Use grid search with `NumGridDivisions` values per dimension.

• `'randomsearch'` — Search at random among `MaxObjectiveEvaluations` points.

`'gridsearch'` searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command `sortrows(Mdl.HyperparameterOptimizationResults)`.

`'bayesopt'`
`AcquisitionFunctionName`

• `'expected-improvement-per-second-plus'`

• `'expected-improvement'`

• `'expected-improvement-plus'`

• `'expected-improvement-per-second'`

• `'lower-confidence-bound'`

• `'probability-of-improvement'`

Acquisition functions whose names include `per-second` do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include `plus` modify their behavior when they are overexploiting an area. For more details, see Acquisition Function Types.

`'expected-improvement-per-second-plus'`
`MaxObjectiveEvaluations`Maximum number of objective function evaluations.`30` for `'bayesopt'` or `'randomsearch'`, and the entire grid for `'gridsearch'`
`MaxTime`

Time limit, specified as a positive real. The time limit is in seconds, as measured by `tic` and `toc`. Run time can exceed `MaxTime` because `MaxTime` does not interrupt function evaluations.

`Inf`
`NumGridDivisions`For `'gridsearch'`, the number of values in each dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.`10`
`ShowPlots`Logical value indicating whether to show plots. If `true`, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if `Optimizer` is `'bayesopt'`, then `ShowPlots` also plots a model of the objective function against the parameters.`true`
`SaveIntermediateResults`Logical value indicating whether to save results when `Optimizer` is `'bayesopt'`. If `true`, this field overwrites a workspace variable named `'BayesoptResults'` at each iteration. The variable is a `BayesianOptimization` object.`false`
`Verbose`

Display to the command line.

• `0` — No iterative display

• `1` — Iterative display

• `2` — Iterative display with extra information

For details, see the `bayesopt` `Verbose` name-value pair argument.

`1`
`UseParallel`Logical value indicating whether to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see Parallel Bayesian Optimization.`false`
`Repartition`

Logical value indicating whether to repartition the cross-validation at every iteration. If `false`, the optimizer uses a single partition for the optimization.

`true` usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, `true` requires at least twice as many function evaluations.

`false`
Use no more than one of the following three field names.
`CVPartition`A `cvpartition` object, as created by `cvpartition`.`'Kfold',5` if you do not specify any cross-validation field
`Holdout`A scalar in the range `(0,1)` representing the holdout fraction.
`Kfold`An integer greater than 1.

Example: `'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60)`

Data Types: `struct`

## Output Arguments

collapse all

Trained SVM regression model, returned as a `RegressionSVM` model or `RegressionPartitionedSVM` cross-validated model.

If you set any of the name-value pair arguments `KFold`, `Holdout`, `Leaveout`, `CrossVal`, or `CVPartition`, then `Mdl` is a `RegressionPartitionedSVM` cross-validated model. Otherwise, `Mdl` is a `RegressionSVM` model.

## Limitations

`fitrsvm` supports low- through moderate-dimensional data sets. For high-dimensional data set, use `fitrlinear` instead.

## Tips

• Unless your data set is large, always try to standardize the predictors (see `Standardize`). Standardization makes predictors insensitive to the scales on which they are measured.

• It is good practice to cross-validate using the `KFold` name-value pair argument. The cross-validation results determine how well the SVM model generalizes.

• Sparsity in support vectors is a desirable property of an SVM model. To decrease the number of support vectors, set the `BoxConstraint` name-value pair argument to a large value. This action also increases the training time.

• For optimal training time, set `CacheSize` as high as the memory limit on your computer allows.

• If you expect many fewer support vectors than observations in the training set, then you can significantly speed up convergence by shrinking the active-set using the name-value pair argument `'ShrinkagePeriod'`. It is good practice to use `'ShrinkagePeriod',1000`.

• Duplicate observations that are far from the regression line do not affect convergence. However, just a few duplicate observations that occur near the regression line can slow down convergence considerably. To speed up convergence, specify `'RemoveDuplicates',true` if:

• Your data set contains many duplicate observations.

• You suspect that a few duplicate observations can fall near the regression line.

However, to maintain the original data set during training, `fitrsvm` must temporarily store separate data sets: the original and one without the duplicate observations. Therefore, if you specify `true` for data sets containing few duplicates, then `fitrsvm` consumes close to double the memory of the original data.

• After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.

## Algorithms

• For the mathematical formulation of linear and nonlinear SVM regression problems and the solver algorithms, see Understanding Support Vector Machine Regression.

• `NaN`, `<undefined>`, empty character vector (`''`), empty string (`""`), and `<missing>` values indicate missing data values. `fitrsvm` removes entire rows of data corresponding to a missing response. When normalizing weights, `fitrsvm` ignores any weight corresponding to an observation with at least one missing predictor. Consequently, observation box constraints might not equal `BoxConstraint`.

• `fitrsvm` removes observations that have zero weight.

• If you set `'Standardize',true` and `'Weights'`, then `fitrsvm` standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is, `fitrsvm` standardizes predictor j (xj) using

`${x}_{j}^{\ast }=\frac{{x}_{j}-{\mu }_{j}^{\ast }}{{\sigma }_{j}^{\ast }}.$`

• ${\mu }_{j}^{\ast }=\frac{1}{\sum _{k}{w}_{k}}\sum _{k}{w}_{k}{x}_{jk}.$

• xjk is observation k (row) of predictor j (column).

• ${\left({\sigma }_{j}^{\ast }\right)}^{2}=\frac{{v}_{1}}{{v}_{1}^{2}-{v}_{2}}\sum _{k}{w}_{k}{\left({x}_{jk}-{\mu }_{j}^{\ast }\right)}^{2}.$

• ${v}_{1}=\sum _{j}{w}_{j}.$

• ${v}_{2}=\sum _{j}{\left({w}_{j}\right)}^{2}.$

• If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.

• The `PredictorNames` property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then `PredictorNames` is a 1-by-3 cell array of character vectors containing the original names of the predictor variables.

• The `ExpandedPredictorNames` property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then `ExpandedPredictorNames` is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables.

• Similarly, the `Beta` property stores one beta coefficient for each predictor, including the dummy variables.

• The `SupportVectors` property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. Then `SupportVectors` is an m-by-5 matrix.

• The `X` property stores the training data as originally input. It does not include the dummy variables. When the input is a table, `X` contains only the columns used as predictors.

• For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.

• For a variable having k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is -1 for levels up to j, and +1 for levels j + 1 through k.

• The names of the dummy variables stored in the `ExpandedPredictorNames` property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k.

• All solvers implement L1 soft-margin minimization.

• Let `p` be the proportion of outliers that you expect in the training data. If you set `'OutlierFraction',p`, then the software implements robust learning. In other words, the software attempts to remove 100`p`% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.

## References

[1] Clark, D., Z. Schreter, A. Adams. A Quantitative Comparison of Dystal and Backpropagation, submitted to the Australian Conference on Neural Networks, 1996.

[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.” Journal of Machine Learning Research, Vol 6, 2005, pp. 1889–1918.

[3] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” In Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.

[4] Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

[5] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report No. 48, 1994.

[6] Waugh, S. Extending and benchmarking Cascade-Correlation, Ph.D. thesis, Computer Science Department, University of Tasmania, 1995.