# fitrgam

Fit generalized additive model (GAM) for regression

## Syntax

## Description

returns a generalized additive model
`Mdl`

= fitrgam(`Tbl`

,`ResponseVarName`

)`Mdl`

trained using the sample data contained in the table
`Tbl`

. The input argument `ResponseVarName`

is the
name of the variable in `Tbl`

that contains the response values for
regression.

specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example,
`Mdl`

= fitrgam(___,`Name,Value`

)`'Interactions',5`

specifies to include five interaction terms in the
model. You can also specify a list of interaction terms using the
`'Interactions'`

name-value argument.

## Examples

### Train Generalized Additive Model

Train a univariate GAM, which contains linear terms for predictors. Then, interpret the prediction for a specified data instance by using the `plotLocalEffects`

function.

Load the data set `NYCHousing2015`

.

`load NYCHousing2015`

The data set includes 10 variables with information on the sales of properties in New York City in 2015. This example uses these variables to analyze the sale prices (`SALEPRICE`

).

Preprocess the data set. Remove outliers, convert the `datetime`

array (`SALEDATE`

) to the month numbers, and move the response variable (`SALEPRICE`

) to the last column.

idx = isoutlier(NYCHousing2015.SALEPRICE); NYCHousing2015(idx,:) = []; NYCHousing2015.SALEDATE = month(NYCHousing2015.SALEDATE); NYCHousing2015 = movevars(NYCHousing2015,'SALEPRICE','After','SALEDATE');

Display the first three rows of the table.

head(NYCHousing2015,3)

`ans=`*3×10 table*
BOROUGH NEIGHBORHOOD BUILDINGCLASSCATEGORY RESIDENTIALUNITS COMMERCIALUNITS LANDSQUAREFEET GROSSSQUAREFEET YEARBUILT SALEDATE SALEPRICE
_______ ____________ ____________________________ ________________ _______________ ______________ _______________ _________ ________ _________
2 {'BATHGATE'} {'01 ONE FAMILY DWELLINGS'} 1 0 4750 2619 1899 8 0
2 {'BATHGATE'} {'01 ONE FAMILY DWELLINGS'} 1 0 4750 2619 1899 8 0
2 {'BATHGATE'} {'01 ONE FAMILY DWELLINGS'} 1 1 1287 2528 1899 12 0

Train a univariate GAM for the sale prices. Specify the variables for `BOROUGH`

, `NEIGHBORHOOD`

, `BUILDINGCLASSCATEGORY`

, and `SALEDATE`

as categorical predictors.

Mdl = fitrgam(NYCHousing2015,'SALEPRICE','CategoricalPredictors',[1 2 3 9])

Mdl = RegressionGAM PredictorNames: {1x9 cell} ResponseName: 'SALEPRICE' CategoricalPredictors: [1 2 3 9] ResponseTransform: 'none' Intercept: 3.7518e+05 IsStandardDeviationFit: 0 NumObservations: 83517 Properties, Methods

`Mdl`

is a `RegressionGAM`

model object. The model display shows a partial list of the model properties. To view the full list of properties, double-click the variable name `Mdl`

in the Workspace. The Variables editor opens for `Mdl`

. Alternatively, you can display the properties in the Command Window by using dot notation. For example, display the estimated intercept (constant) term of `Mdl`

.

Mdl.Intercept

ans = 3.7518e+05

Predict the sale price for the first observation of the training data, and plot the local effects of the terms in `Mdl`

on the prediction.

yFit = predict(Mdl,NYCHousing2015(1,:))

yFit = 4.4421e+05

plotLocalEffects(Mdl,NYCHousing2015(1,:))

The `predict`

function predicts the sale price for the first observation as `4.4421e5`

. The `plotLocalEffects`

function creates a horizontal bar graph that shows the local effects of the terms in `Mdl`

on the prediction. Each local effect value shows the contribution of each term to the predicted sale price.

### Train GAM with Interaction Terms

Train a generalized additive model that contains linear and interaction terms for predictors in three different ways:

Specify the interaction terms using the

`formula`

input argument.Specify the

`'Interactions'`

name-value argument.Build a model with linear terms first and add interaction terms to the model by using the

`addInteractions`

function.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s.

`load carbig`

Create a table that contains the predictor variables (`Acceleration`

, `Displacement`

, `Horsepower`

, and `Weight`

) and the response variable (`MPG`

).

tbl = table(Acceleration,Displacement,Horsepower,Weight,MPG);

**Specify formula**

Train a GAM that contains the four linear terms (`Acceleration`

, `Displacement`

, `Horsepower`

, and `Weight`

) and two interaction terms (`Acceleration*Displacement`

and `Displacement*Horsepower`

). Specify the terms using a formula in the form `'Y ~ terms'`

.

`Mdl1 = fitrgam(tbl,'MPG ~ Acceleration + Displacement + Horsepower + Weight + Acceleration:Displacement + Displacement:Horsepower');`

The function adds interaction terms to the model in the order of importance. You can use the `Interactions`

property to check the interaction terms in the model and the order in which `fitrgam`

adds them to the model. Display the `Interactions`

property.

Mdl1.Interactions

`ans = `*2×2*
2 3
1 2

Each row of `Interactions`

represents one interaction term and contains the column indexes of the predictor variables for the interaction term.

**Specify 'Interactions'**

Pass the training data (`tbl`

) and the name of the response variable in `tbl`

to `fitrgam`

, so that the function includes the linear terms for all the other variables as predictors. Specify the `'Interactions'`

name-value argument using a logical matrix to include the two interaction terms, `x1*x2`

and `x2*x3`

.

Mdl2 = fitrgam(tbl,'MPG','Interactions',logical([1 1 0 0; 0 1 1 0])); Mdl2.Interactions

`ans = `*2×2*
2 3
1 2

You can also specify `'Interactions'`

as the number of interaction terms or as `'all'`

to include all available interaction terms. Among the specified interaction terms, `fitrgam`

identifies those whose *p*-values are not greater than the `'MaxPValue'`

value and adds them to the model. The default `'MaxPValue'`

is 1 so that the function adds all specified interaction terms to the model.

Specify `'Interactions','all'`

and set the `'MaxPValue'`

name-value argument to 0.05.

Mdl3 = fitrgam(tbl,'MPG','Interactions','all','MaxPValue',0.05);

Warning: Model does not include interaction terms because all interaction terms have p-values greater than the 'MaxPValue' value, or the software was unable to improve the model fit.

Mdl3.Interactions

ans = 0x2 empty double matrix

`Mdl3`

includes no interaction terms, which implies one of the following: all interaction terms have *p*-values greater than 0.05, or adding the interaction terms does not improve the model fit.

**Use addInteractions Function**

Train a univariate GAM that contains linear terms for predictors, and then add interaction terms to the trained model by using the `addInteractions`

function. Specify the second input argument of `addInteractions`

in the same way you specify the `'Interactions'`

name-value argument of `fitrgam`

. You can specify the list of interaction terms using a logical matrix, the number of interaction terms, or `'all'`

.

Specify the number of interaction terms as 3 to add the three most important interaction terms to the trained model.

```
Mdl4 = fitrgam(tbl,'MPG');
UpdatedMdl4 = addInteractions(Mdl4,3);
UpdatedMdl4.Interactions
```

`ans = `*3×2*
2 3
1 2
3 4

`Mdl4`

is a univariate GAM, and `UpdatedMdl4`

is an updated GAM that contains all the terms in `Mdl4`

and three additional interaction terms.

### Create Cross-Validated GAM Using `fitrgam`

Train a cross-validated GAM with 10 folds, which is the default cross-validation option, by using `fitrgam`

. Then, use `kfoldPredict`

to predict responses for validation-fold observations using a model trained on training-fold observations.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s.

`load carbig`

Create a table that contains the predictor variables (`Acceleration`

, `Displacement`

, `Horsepower`

, and `Weight`

) and the response variable (`MPG`

).

tbl = table(Acceleration,Displacement,Horsepower,Weight,MPG);

Create a cross-validated GAM by using the default cross-validation option. Specify the `'CrossVal'`

name-value argument as `'on'`

.

rng('default') % For reproducibility CVMdl = fitrgam(tbl,'MPG','CrossVal','on')

CVMdl = RegressionPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {1x4 cell} ResponseName: 'MPG' NumObservations: 398 KFold: 10 Partition: [1x1 cvpartition] NumTrainedPerFold: [1x1 struct] ResponseTransform: 'none' IsStandardDeviationFit: 0 Properties, Methods

The `fitrgam`

function creates a `RegressionPartitionedGAM`

model object `CVMdl`

with 10 folds. During cross-validation, the software completes these steps:

Randomly partition the data into 10 sets.

For each set, reserve the set as validation data, and train the model using the other 9 sets.

Store the 10 compact, trained models a in a 10-by-1 cell vector in the

`Trained`

property of the cross-validated model object`RegressionPartitionedGAM`

.

You can override the default cross-validation setting by using the `'CVPartition'`

, `'Holdout'`

, `'KFold'`

, or `'Leaveout' `

name-value argument.

Predict responses for the observations in `tbl`

by using `kfoldPredict`

. The function predicts responses for every observation using the model trained without that observation.

yHat = kfoldPredict(CVMdl);

`yHat`

is a numeric vector. Display the first five predicted responses.

yHat(1:5)

`ans = `*5×1*
19.4848
15.7203
15.5742
15.3185
17.8223

Compute the regression loss (mean squared error).

L = kfoldLoss(CVMdl)

L = 17.7248

`kfoldLoss`

returns the average mean squared error over 10 folds.

### Optimize GAM Using `OptimizeHyperparameters`

Optimize the hyperparameters of a GAM with respect to cross-validation by using the OptimizeHyperparameters name-value argument.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s.

`load carbig`

Specify `Acceleration`

, `Displacement`

, `Horsepower`

, and `Weight`

as the predictor variables (`X`

) and `MPG`

as the response variable (`Y`

).

X = [Acceleration,Displacement,Horsepower,Weight]; Y = MPG;

Partition the data into training and test sets. Use approximately 80% of the observations to train a model, and 20% of the observations to test the performance of the trained model on new data. Use `cvpartition`

to partition the data.

rng('default') % For reproducibility cvp = cvpartition(length(MPG),'Holdout',0.20); XTrain = X(training(cvp),:); YTrain = Y(training(cvp)); XTest = X(test(cvp),:); YTest = Y(test(cvp));

Train a GAM for regression by passing the training data to the `fitrgam`

function, and include the `OptimizeHyperparameters`

argument. Specify `'OptimizeHyperparameters'`

as `'auto'`

so that `fitrgam`

finds optimal values of `InitialLearnRateForPredictors`

, `NumTreesPerPredictor`

, `Interactions`

, `InitialLearnRateForInteractions`

, and `NumTreesPerInteraction`

. For reproducibility, choose the `'expected-improvement-plus'`

acquisition function. The default acquisition function depends on run time and, therefore, can give varying results.

rng('default') Mdl = fitrgam(XTrain,YTrain,'OptimizeHyperparameters','auto', ... 'HyperparameterOptimizationOptions', ... struct('AcquisitionFunctionName','expected-improvement-plus'))

|==========================================================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | InitialLearnRate-| NumTreesPerP-| Interactions | InitialLearnRate-| NumTreesPerI-| | | result | log(1+loss) | runtime | (observed) | (estim.) | ForPredictors | redictor | | ForInteractions | nteraction | |==========================================================================================================================================================| | 1 | Best | 2.874 | 4.6069 | 2.874 | 2.874 | 0.21533 | 500 | 1 | 0.35042 | 13 | | 2 | Accept | 2.89 | 0.20809 | 2.874 | 2.8748 | 0.062841 | 14 | 1 | 0.014907 | 10 | | 3 | Accept | 3.3298 | 1.796 | 2.874 | 2.8746 | 0.001387 | 222 | 0 | - | - | | 4 | Best | 2.8562 | 5.8182 | 2.8562 | 2.8564 | 0.08216 | 434 | 4 | 0.14875 | 283 | | 5 | Accept | 2.976 | 1.8052 | 2.8562 | 2.8564 | 0.99942 | 217 | 1 | 0.0017491 | 34 | | 6 | Best | 2.8195 | 1.382 | 2.8195 | 2.8198 | 0.13778 | 152 | 6 | 0.012566 | 13 | | 7 | Best | 2.7519 | 0.90985 | 2.7519 | 2.752 | 0.12531 | 42 | 4 | 0.27647 | 53 | | 8 | Best | 2.7301 | 3.565 | 2.7301 | 2.7301 | 0.18671 | 10 | 3 | 0.0063418 | 487 | | 9 | Best | 2.7196 | 0.46532 | 2.7196 | 2.7196 | 0.13792 | 10 | 5 | 0.1663 | 27 | | 10 | Accept | 2.8281 | 2.9027 | 2.7196 | 2.7196 | 0.23324 | 10 | 4 | 0.75904 | 314 | | 11 | Accept | 2.7864 | 0.25131 | 2.7196 | 2.7196 | 0.13035 | 10 | 1 | 0.30171 | 476 | | 12 | Accept | 2.7993 | 0.61803 | 2.7196 | 2.7647 | 0.16476 | 10 | 6 | 0.015498 | 32 | | 13 | Accept | 2.7847 | 4.5171 | 2.7196 | 2.7197 | 0.0090953 | 499 | 5 | 0.027878 | 40 | | 14 | Accept | 3.5847 | 0.27508 | 2.7196 | 2.7592 | 0.0035123 | 11 | 3 | 0.011127 | 11 | | 15 | Accept | 2.7237 | 4.9018 | 2.7196 | 2.759 | 0.015848 | 498 | 3 | 0.14359 | 238 | | 16 | Accept | 2.779 | 1.569 | 2.7196 | 2.7588 | 0.012829 | 10 | 3 | 0.028814 | 217 | | 17 | Accept | 2.7761 | 4.7776 | 2.7196 | 2.7272 | 0.023165 | 488 | 1 | 0.32642 | 302 | | 18 | Accept | 2.8604 | 4.1417 | 2.7196 | 2.7677 | 0.013548 | 495 | 2 | 0.97963 | 141 | | 19 | Accept | 3.5466 | 0.12735 | 2.7196 | 2.7196 | 0.019794 | 10 | 0 | - | - | | 20 | Accept | 2.7513 | 7.3431 | 2.7196 | 2.7196 | 0.02408 | 62 | 6 | 0.023502 | 490 | |==========================================================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | InitialLearnRate-| NumTreesPerP-| Interactions | InitialLearnRate-| NumTreesPerI-| | | result | log(1+loss) | runtime | (observed) | (estim.) | ForPredictors | redictor | | ForInteractions | nteraction | |==========================================================================================================================================================| | 21 | Accept | 2.7243 | 0.92354 | 2.7196 | 2.7196 | 0.040761 | 11 | 3 | 0.10556 | 120 | | 22 | Best | 2.6969 | 5.0161 | 2.6969 | 2.697 | 0.0032557 | 494 | 2 | 0.039381 | 487 | | 23 | Accept | 2.8184 | 3.8034 | 2.6969 | 2.697 | 0.0072249 | 19 | 3 | 0.27653 | 494 | | 24 | Accept | 2.7788 | 4.3989 | 2.6969 | 2.697 | 0.0064015 | 482 | 1 | 0.013479 | 479 | | 25 | Accept | 2.7646 | 4.4343 | 2.6969 | 2.6971 | 0.0013222 | 473 | 2 | 0.17272 | 436 | | 26 | Accept | 2.8368 | 0.28304 | 2.6969 | 2.6971 | 0.93418 | 11 | 5 | 0.16983 | 11 | | 27 | Accept | 2.7724 | 1.7205 | 2.6969 | 2.6971 | 0.039216 | 11 | 2 | 0.037865 | 480 | | 28 | Accept | 2.8795 | 0.87918 | 2.6969 | 2.6971 | 0.73103 | 11 | 1 | 0.014567 | 480 | | 29 | Accept | 2.782 | 4.0221 | 2.6969 | 2.7267 | 0.0047632 | 493 | 1 | 0.069346 | 247 | | 30 | Accept | 2.7734 | 0.98578 | 2.6969 | 2.7297 | 0.038679 | 103 | 1 | 0.052986 | 68 |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 88.0979 seconds Total objective function evaluation time: 78.4482 Best observed feasible point: InitialLearnRateForPredictors NumTreesPerPredictor Interactions InitialLearnRateForInteractions NumTreesPerInteraction _____________________________ ____________________ ____________ _______________________________ ______________________ 0.0032557 494 2 0.039381 487 Observed objective function value = 2.6969 Estimated objective function value = 2.7297 Function evaluation time = 5.0161 Best estimated feasible point (according to models): InitialLearnRateForPredictors NumTreesPerPredictor Interactions InitialLearnRateForInteractions NumTreesPerInteraction _____________________________ ____________________ ____________ _______________________________ ______________________ 0.0032557 494 2 0.039381 487 Estimated objective function value = 2.7297 Estimated function evaluation time = 5.009

Mdl = RegressionGAM ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' Intercept: 23.7405 Interactions: [2×2 double] IsStandardDeviationFit: 0 NumObservations: 318 HyperparameterOptimizationResults: [1×1 BayesianOptimization] Properties, Methods

`fitrgam`

returns a `RegressionGAM`

model object that uses the best estimated feasible point. The best estimated feasible point is the set of hyperparameters that minimizes the upper confidence bound of the cross-validation loss (mean squared error, MSE) based on the underlying Gaussian process model of the Bayesian optimization process.

The Bayesian optimization process internally maintains a Gaussian process model of the objective function. The objective function is `log`

(1 + cross-validation MSE) for regression. For each iteration, the optimization process updates the Gaussian process model and uses the model to find a new set of hyperparameters. Each line of the iterative display shows the new set of hyperparameters and these column values:

`Objective`

— Objective function value computed at the new set of hyperparameters.`Objective runtime`

— Objective function evaluation time.`Eval result`

— Result report, specified as`Accept`

,`Best`

, or`Error`

.`Accept`

indicates that the objective function returns a finite value, and`Error`

indicates that the objective function returns a value that is not a finite real scalar.`Best`

indicates that the objective function returns a finite value that is lower than previously computed objective function values.`BestSoFar(observed)`

— The minimum objective function value computed so far. This value is either the objective function value of the current iteration (if the`Eval result`

value for the current iteration is`Best`

) or the value of the previous`Best`

iteration.`BestSoFar(estim.)`

— At each iteration, the software estimates the upper confidence bounds of the objective function values, using the updated Gaussian process model, at all the sets of hyperparameters tried so far. Then the software chooses the point with the minimum upper confidence bound. The`BestSoFar(estim.)`

value is the objective function value returned by the`predictObjective`

function at the minimum point.

The plot below the iterative display shows the `BestSoFar(observed)`

and `BestSoFar(estim.)`

values in blue and green, respectively.

The returned object `Mdl`

uses the best estimated feasible point, that is, the set of hyperparameters that produces the `BestSoFar(estim.)`

value in the final iteration based on the final Gaussian process model.

Obtain the best estimated feasible point from `Mdl`

in the `HyperparameterOptimizationResults`

property.

Mdl.HyperparameterOptimizationResults.XAtMinEstimatedObjective

`ans=`*1×5 table*
InitialLearnRateForPredictors NumTreesPerPredictor Interactions InitialLearnRateForInteractions NumTreesPerInteraction
_____________________________ ____________________ ____________ _______________________________ ______________________
0.0032557 494 2 0.039381 487

Alternatively, you can use the `bestPoint`

function. By default, the `bestPoint`

function uses the `'min-visited-upper-confidence-interval'`

criterion.

[x,CriterionValue,iteration] = bestPoint(Mdl.HyperparameterOptimizationResults)

`x=`*1×5 table*
InitialLearnRateForPredictors NumTreesPerPredictor Interactions InitialLearnRateForInteractions NumTreesPerInteraction
_____________________________ ____________________ ____________ _______________________________ ______________________
0.0032557 494 2 0.039381 487

CriterionValue = 2.7908

iteration = 22

You can also extract the best observed feasible point (that is, the last `Best`

point in the iterative display) from the `HyperparameterOptimizationResults`

property or by specifying `Criterion`

as `'min-observed'`

.

Mdl.HyperparameterOptimizationResults.XAtMinObjective

`ans=`*1×5 table*
InitialLearnRateForPredictors NumTreesPerPredictor Interactions InitialLearnRateForInteractions NumTreesPerInteraction
_____________________________ ____________________ ____________ _______________________________ ______________________
0.0032557 494 2 0.039381 487

[x_observed,CriterionValue_observed,iteration_observed] = bestPoint(Mdl.HyperparameterOptimizationResults,'Criterion','min-observed')

`x_observed=`*1×5 table*
InitialLearnRateForPredictors NumTreesPerPredictor Interactions InitialLearnRateForInteractions NumTreesPerInteraction
_____________________________ ____________________ ____________ _______________________________ ______________________
0.0032557 494 2 0.039381 487

CriterionValue_observed = 2.6969

iteration_observed = 22

In this example, the two criteria choose the same set (22nd iteration) of hyperparameters as the best point. The criterion value of each is different because `CriterionValue`

is the upper bound of the objective function value computed by the final Gaussian process model, and `CriterionValue_observed`

is the actual objective function value computed using the selected hyperparameters. For more information, see the Criterion name-value argument of `bestPoint`

.

Evaluate the performance of the regression model on the training set and test set by computing the mean squared errors (MSEs). Smaller MSE values indicate better performance.

LTraining = resubLoss(Mdl)

LTraining = 6.2224

LTest = loss(Mdl,XTest,YTest)

LTest = 18.5724

### Optimize Cross-Validated GAM Using `bayesopt`

Optimize the parameters of a GAM with respect to cross-validation by using the `bayesopt`

function.

Alternatively, you can find optimal values of `fitrgam`

name-value arguments by using the OptimizeHyperparameters name-value argument. For an example, see Optimize GAM Using OptimizeHyperparameters.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s.

`load carbig`

Specify `Acceleration`

, `Displacement`

, `Horsepower`

, and `Weight`

as the predictor variables (`X`

) and `MPG`

as the response variable (`Y`

).

X = [Acceleration,Displacement,Horsepower,Weight]; Y = MPG;

You must remove the observations with missing response values to fix the cross-validation sets for the optimization process. Remove missing values from the response variable, and remove the corresponding observations in the predictor variables.

[Y,TF] = rmmissing(Y); X = X(~TF);

Set up a partition for cross-validation. This step fixes the cross-validation sets that the optimization uses at each step.

`c = cvpartition(length(Y),'KFold',5);`

Prepare `optimizableVariable`

objects for the name-value arguments that you want to optimize using Bayesian optimization. This example finds optimal values for the `MaxNumSplitsPerPredictor`

and `NumTreesPerPredictor`

arguments of `fitrgam`

.

maxNumSplits = optimizableVariable('maxNumSplits',[1,10],'Type','integer'); numTrees = optimizableVariable('numTrees',[1,500],'Type','integer');

Create an objective function that takes an input `z = [maxNumSplits,numTrees]`

and returns the cross-validated loss value of `z`

.

minfun = @(z)kfoldLoss(fitrgam(X,Y,'CVPartition',c, ... 'MaxNumSplitsPerPredictor',z.maxNumSplits, ... 'NumTreesPerPredictor',z.numTrees));

If you specify a cross-validation option, then the `fitrgam`

function returns a cross-validated model object `RegressionPartitionedGAM`

. The `kfoldLoss`

function returns the regression loss (mean squared error) obtained by the cross-validated model. Therefore, the function handle `minfun`

computes the cross-validation loss at the parameters in `z`

.

Search for the best parameters `[maxNumSplits,numTrees]`

using `bayesopt`

. For reproducibility, choose the `'expected-improvement-plus'`

acquisition function. The default acquisition function depends on run time and, therefore, can give varying results.

rng('default') results = bayesopt(minfun,[maxNumSplits,numTrees],'Verbose',0, ... 'IsObjectiveDeterministic',true, ... 'AcquisitionFunctionName','expected-improvement-plus');

Obtain the best point from `results`

.

zbest = bestPoint(results)

`zbest=`*1×2 table*
maxNumSplits numTrees
____________ ________
1 8

Train an optimized GAM using the `zbest`

values.

Mdl = fitrgam(X,Y, ... 'MaxNumSplitsPerPredictor',zbest.maxNumSplits, ... 'NumTreesPerPredictor',zbest.numTrees);

## Input Arguments

`Tbl`

— Sample data

table

Sample data used to train the model, specified as a table. Each row of
`Tbl`

corresponds to one observation, and each column corresponds
to one predictor variable. Multicolumn variables and cell arrays other than cell arrays
of character vectors are not allowed.

Optionally,

`Tbl`

can contain a column for the response variable and a column for the observation weights. The response variable and the weight values must be numeric vectors.You must specify the response variable in

`Tbl`

by using`ResponseVarName`

or`formula`

and specify the observation weights in`Tbl`

by using`'Weights'`

.Specify the response variable by using

`ResponseVarName`

—`fitrgam`

uses the remaining variables as predictors. To use a subset of the remaining variables in`Tbl`

as predictors, specify predictor variables by using`'PredictorNames'`

.Define a model specification by using

`formula`

—`fitrgam`

uses a subset of the variables in`Tbl`

as predictor variables and the response variable, as specified in`formula`

.

If

`Tbl`

does not contain the response variable, then specify a response variable by using`Y`

. The length of the response variable`Y`

and the number of rows in`Tbl`

must be equal. To use a subset of the variables in`Tbl`

as predictors, specify predictor variables by using`'PredictorNames'`

.

`fitrgam`

considers `NaN`

,
`''`

(empty character vector), `""`

(empty string),
`<missing>`

, and `<undefined>`

values in
`Tbl`

to be missing values.

`fitrgam`

does not use observations with all missing values in the fit.`fitrgam`

does not use observations with missing response values in the fit.`fitrgam`

uses observations with some missing values for predictors to find splits on variables for which these observations have valid values.

**Data Types: **`table`

`ResponseVarName`

— Response variable name

name of variable in `Tbl`

Response variable name, specified as a character vector or string scalar containing the name
of the response variable in `Tbl`

. For example, if the response
variable `Y`

is stored in `Tbl.Y`

, then specify it as
`'Y'`

.

**Data Types: **`char`

| `string`

`formula`

— Model specification

character vector | string scalar

Model specification, specified as a character vector or string scalar in the form
`'Y ~ terms'`

. The `formula`

argument specifies
a response variable and linear and interaction terms for predictor variables. Use
`formula`

to specify a subset of variables in
`Tbl`

as predictors for training the model. If you specify a
formula, then the software does not use any variables in `Tbl`

that
do not appear in `formula`

.

For example, specify `'Y~x1+x2+x3+x1:x2'`

. In this form,
`Y`

represents the response variable, and `x1`

,
`x2`

, and `x3`

represent the linear terms for the
predictor variables. `x1:x2`

represents the interaction term for
`x1`

and `x2`

.

The variable names in the formula must be both variable names in `Tbl`

(`Tbl.Properties.VariableNames`

) and valid MATLAB^{®} identifiers. You can verify the variable names in `Tbl`

by
using the `isvarname`

function. If the variable names
are not valid, then you can convert them by using the `matlab.lang.makeValidName`

function.

Alternatively, you can specify a response variable and linear terms for predictors
using `formula`

, and specify interaction terms for predictors using
`'Interactions'`

.

`fitrgam`

builds a set of interaction trees using only the
terms whose *p*-values are not greater than the
`'MaxPValue'`

value.

**Example: **`'Y~x1+x2+x3+x1:x2'`

**Data Types: **`char`

| `string`

`Y`

— Response data

numeric column vector

`X`

— Predictor data

numeric matrix

Predictor data, specified as a numeric matrix. Each row of `X`

corresponds to one observation, and each column corresponds to one predictor variable.

`fitrgam`

considers `NaN`

values in
`X`

as missing values. The function does not use observations
with all missing values in the fit. `fitrgam`

uses observations
with some missing values for `X`

to find splits on variables for
which these observations have valid values.

**Data Types: **`single`

| `double`

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`'Interactions','all','MaxPValue',0.05`

specifies to include
all available interaction terms whose *p*-values are not greater than
0.05.

**GAM Options**

`FitStandardDeviation`

— Flag to fit model for standard deviation

`false`

or `0`

(default) | `true`

or `1`

Flag to fit a model for the standard deviation of the response variable, specified
as logical `0`

(`false`

) or `1`

(`true`

).

If you specify `'FitStandardDeviation'`

as
`true`

, then `fitrgam`

trains an additional
model for the standard deviation of the response variable, and sets the
`IsStandardDeviationFit`

property of the output GAM object
`Mdl`

to `true`

.

To compute the standard deviation values for given observations, use `predict`

,
`resubPredict`

, or `kfoldPredict`

. These functions also return the prediction intervals of
the response variable.

A recommended practice is to use optimal hyperparameters when you fit the standard
deviation model for the accuracy of the standard deviation estimates. Specify
`OptimizeHyperparameters`

as `'all-univariate'`

(for a univariate GAM) or `'all'`

(for a bivariate GAM) together with
`'FitStandardDeviation',true`

.

**Example: **`'FitStandardDeviation',true`

**Data Types: **`logical`

`InitialLearnRateForInteractions`

— Learning rate of gradient boosting for interaction terms

`1`

(default) | numeric scalar in (0,1]

Learning rate of the gradient boosting for interaction terms, specified as a
numeric scalar in the interval (0,1]. `fitrgam`

uses this rate
throughout the training for interaction terms.

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy.

For more details about gradient boosting, see Gradient Boosting Algorithm.

**Example: **`'InitialLearnRateForInteractions',0.1`

**Data Types: **`single`

| `double`

`InitialLearnRateForPredictors`

— Learning rate of gradient boosting for linear terms

`1`

(default) | numeric scalar in (0,1]

Learning rate of the gradient boosting for linear terms, specified as a numeric
scalar in the interval (0,1]. `fitrgam`

uses this rate throughout
the training for linear terms.

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy.

For more details about gradient boosting, see Gradient Boosting Algorithm.

**Example: **`'InitialLearnRateForPredictors',0.1`

**Data Types: **`single`

| `double`

`Interactions`

— Number or list of interaction terms

`0`

(default) | nonnegative integer scalar | logical matrix | `'all'`

Number or list of interaction terms to include in the candidate set *S*,
specified as a nonnegative integer scalar, a logical matrix, or
`'all'`

.

Number of interaction terms, specified as a nonnegative integer —

*S*includes the specified number of important interaction terms, selected based on the*p*-values of the terms.List of interaction terms, specified as a logical matrix —

*S*includes the terms specified by a`t`

-by-`p`

logical matrix, where`t`

is the number of interaction terms, and`p`

is the number of predictors used to train the model. For example,`logical([1 1 0; 0 1 1])`

represents two pairs of interaction terms: a pair of the first and second predictors, and a pair of the second and third predictors.If

`fitrgam`

uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. That is, the column indexes of the logical matrix do not count the response and observation weight variables. The indexes also do not count any variables not used by the function.`'all'`

—*S*includes all possible pairs of interaction terms, which is`p*(p – 1)/2`

number of terms in total.

Among the interaction terms in *S*, the `fitrgam`

function identifies those whose *p*-values are not greater than the
`'MaxPValue'`

value and uses them to build a set of
interaction trees. Use the default value (`'MaxPValue'`

,1) to
build interaction trees using all terms in *S*.

**Example: **`'Interactions','all'`

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

`MaxNumSplitsPerInteraction`

— Maximum number of decision splits per interaction tree

4 (default) | positive integer scalar

Maximum number of decision splits (or branch nodes) for each interaction tree (boosted tree for an interaction term), specified as a positive integer scalar.

**Example: **`'MaxNumSplitsPerInteraction',5`

**Data Types: **`single`

| `double`

`MaxNumSplitsPerPredictor`

— Maximum number of decision splits per predictor tree

1 (default) | positive integer scalar

Maximum number of decision splits (or branch nodes) for each predictor tree (boosted tree for
a linear term), specified as a positive integer
scalar. By default,
`fitrgam`

uses a tree stump
for a predictor tree.

**Example: **`'MaxNumSplitsPerPredictor',5`

**Data Types: **`single`

| `double`

`MaxPValue`

— Maximum *p*-value for detecting interaction terms

1 (default) | numeric scalar in [0,1]

Maximum *p*-value for detecting interaction terms, specified as a numeric
scalar in the interval [0,1].

`fitrgam`

first finds the candidate set *S* of
interaction terms from `formula`

or
`'Interactions'`

. Then the function identifies the interaction
terms whose *p*-values are not greater than the
`'MaxPValue'`

value and uses them to build a set of interaction
trees.

The default value (`'MaxPValue',1`

) builds interaction trees for all
interaction terms in the candidate set *S*.

For more details about detecting interaction terms, see Interaction Term Detection.

**Example: **`'MaxPValue',0.05`

**Data Types: **`single`

| `double`

`NumBins`

— Number of bins for numeric predictors

`256`

(default) | positive integer scalar | `[]`

(empty)

Number of bins for numeric predictors, specified as a positive integer scalar or
`[]`

(empty).

If you specify the

`'NumBins'`

value as a positive integer scalar (`numBins`

), then`fitrgam`

bins every numeric predictor into at most`numBins`

equiprobable bins, and then grows trees on the bin indices instead of the original data.The number of bins can be less than

`numBins`

if a predictor has fewer than`numBins`

unique values.`fitrgam`

does not bin categorical predictors.

If the

`'NumBins'`

value is empty (`[]`

), then`fitrgam`

does not bin any predictors.

When you use a large training data set, this binning option speeds up training but might cause
a decrease in accuracy. You can first use the default value of
`'NumBins'`

, and then change the value depending on the accuracy
and training speed.

The trained model `Mdl`

stores the bin edges in the
`BinEdges`

property.

**Example: **`'NumBins',50`

**Data Types: **`single`

| `double`

`NumTreesPerInteraction`

— Number of trees per interaction term

100 (default) | positive integer scalar

Number of trees per interaction term, specified as a positive integer scalar.

The `'NumTreesPerInteraction'`

value is equivalent to the number of
gradient boosting iterations for the interaction terms for predictors. For each
iteration, `fitrgam`

adds a set of interaction trees to the
model, one tree for each interaction term. To learn about the gradient boosting
algorithm, see Gradient Boosting Algorithm.

You can determine whether the fitted model has the specified number of trees by
viewing the diagnostic message displayed when `'Verbose'`

is 1 or 2,
or by checking the `ReasonForTermination`

property value of the model
`Mdl`

.

**Example: **`'NumTreesPerInteraction',500`

**Data Types: **`single`

| `double`

`NumTreesPerPredictor`

— Number of trees per linear term

300 (default) | positive integer scalar

Number of trees per linear term, specified as a positive integer scalar.

The `'NumTreesPerPredictor'`

value is equivalent to the number of
gradient boosting iterations for the linear terms for predictors. For each iteration,
`fitrgam`

adds a set of predictor trees to the model, one
tree for each predictor. To learn about the gradient boosting algorithm, see Gradient Boosting Algorithm.

You can determine whether the fitted model has the specified number of trees by
viewing the diagnostic message displayed when `'Verbose'`

is 1 or 2,
or by checking the `ReasonForTermination`

property value of the model
`Mdl`

.

**Example: **`'NumTreesPerPredictor',500`

**Data Types: **`single`

| `double`

**Other Regression Options**

`CategoricalPredictors`

— Categorical predictors list

vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `'all'`

Categorical predictors list, specified as one of the values in this table.

Value | Description |
---|---|

Vector of positive integers |
Each entry in the vector is an index value indicating that the corresponding predictor is
categorical. The index values are between 1 and If |

Logical vector |
A |

Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the entries in `PredictorNames` . Pad the names with extra blanks so each row of the character matrix has the same length. |

String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the entries in `PredictorNames` . |

`"all"` | All predictors are categorical. |

By default, if the predictor data is in a table
(`Tbl`

), `fitrgam`

assumes that a variable is
categorical if it is a logical vector, unordered categorical vector, character array, string
array, or cell array of character vectors. If the predictor data is a matrix
(`X`

), `fitrgam`

assumes that all predictors are
continuous. To identify any other predictors as categorical predictors, specify them by using
the `CategoricalPredictors`

name-value argument.

**Example: **`'CategoricalPredictors','all'`

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

| `cell`

`NumPrint`

— Number of iterations between diagnostic message printouts

`10`

(default) | nonnegative integer scalar

Number of iterations between diagnostic message printouts, specified as a nonnegative integer
scalar. This argument is valid only when you specify `'Verbose'`

as 1.

If you specify `'Verbose',1`

and `'NumPrint',numPrint`

, then
the software displays diagnostic messages every `numPrint`

iterations in the Command Window.

**Example: **`'NumPrint',500`

**Data Types: **`single`

| `double`

`PredictorNames`

— Predictor variable names

string array of unique names | cell array of unique character vectors

Predictor variable names, specified as a string array of unique names or cell array of unique
character vectors. The functionality of `PredictorNames`

depends on the
way you supply the training data.

If you supply

`X`

and`Y`

, then you can use`PredictorNames`

to assign names to the predictor variables in`X`

.The order of the names in

`PredictorNames`

must correspond to the column order of`X`

. That is,`PredictorNames{1}`

is the name of`X(:,1)`

,`PredictorNames{2}`

is the name of`X(:,2)`

, and so on. Also,`size(X,2)`

and`numel(PredictorNames)`

must be equal.By default,

`PredictorNames`

is`{'x1','x2',...}`

.

If you supply

`Tbl`

, then you can use`PredictorNames`

to choose which predictor variables to use in training. That is,`fitrgam`

uses only the predictor variables in`PredictorNames`

and the response variable during training.`PredictorNames`

must be a subset of`Tbl.Properties.VariableNames`

and cannot include the name of the response variable.By default,

`PredictorNames`

contains the names of all predictor variables.A good practice is to specify the predictors for training using either

`PredictorNames`

or`formula`

, but not both.

**Example: **`"PredictorNames",["SepalLength","SepalWidth","PetalLength","PetalWidth"]`

**Data Types: **`string`

| `cell`

`ResponseName`

— Response variable name

`"Y"`

(default) | character vector | string scalar

Response variable name, specified as a character vector or string scalar.

If you supply

`Y`

, then you can use`ResponseName`

to specify a name for the response variable.If you supply

`ResponseVarName`

or`formula`

, then you cannot use`ResponseName`

.

**Example: **`"ResponseName","response"`

**Data Types: **`char`

| `string`

`ResponseTransform`

— Response transformation

`'none'`

(default) | function handle

Response transformation, specified as either `'none'`

or a function
handle. The default is `'none'`

, which means `@(y)y`

,
or no transformation. For a MATLAB function or a function you define, use its function handle for the
response transformation. The function handle must accept a vector (the original response
values) and return a vector of the same size (the transformed response values).

**Example: **Suppose you create a function handle that applies an exponential
transformation to an input vector by using `myfunction = @(y)exp(y)`

.
Then, you can specify the response transformation as
`'ResponseTransform',myfunction`

.

**Data Types: **`char`

| `string`

| `function_handle`

`Verbose`

— Verbosity level

`0`

(default) | `1`

| `2`

Verbosity level, specified as `0`

, `1`

, or
`2`

. The `Verbose`

value controls the amount of
information that the software displays in the Command Window.

This table summarizes the available verbosity level options.

Value | Description |
---|---|

`0` | The software displays no information. |

`1` | The software displays diagnostic messages every `numPrint` iterations, where
`numPrint` is the `'NumPrint'`
value. |

`2` | The software displays diagnostic messages at every iteration. |

Each line of the diagnostic messages shows the information about each boosting iteration and includes the following columns:

`Type`

— Type of trained trees,`1D`

(predictor trees, or boosted trees for linear terms for predictors) or`2D`

(interaction trees, or boosted trees for interaction terms for predictors)`NumTrees`

— Number of trees per linear term or interaction term that`fitrgam`

added to the model so far`Deviance`

— Deviance of the model`RelTol`

— Relative change of model predictions: $${\left({\widehat{y}}_{k}-{\widehat{y}}_{k-1}\right)}^{\prime}\left({\widehat{y}}_{k}-{\widehat{y}}_{k-1}\right)/{\widehat{y}}_{k}{}^{\prime}{\widehat{y}}_{k}$$, where $${\widehat{y}}_{k}$$ is a column vector of model predictions at iteration*k*`LearnRate`

— Learning rate used for the current iteration

**Example: **`'Verbose',1`

**Data Types: **`single`

| `double`

`Weights`

— Observation weights

`ones(size(X,1),1)`

(default) | vector of scalar values | name of variable in `Tbl`

Observation weights, specified as a vector of scalar values or the name of a variable in `Tbl`

. The software weights the observations in each row of `X`

or `Tbl`

with the corresponding value in `Weights`

. The size of `Weights`

must equal the number of rows in `X`

or `Tbl`

.

If you specify the input data as a table `Tbl`

, then `Weights`

can be the name of a variable in `Tbl`

that contains a numeric vector. In this case, you must specify `Weights`

as a character vector or string scalar. For example, if weights vector `W`

is stored as `Tbl.W`

, then specify it as `'W'`

.

`fitrgam`

normalizes the values of `Weights`

to sum to 1.

**Data Types: **`single`

| `double`

| `char`

| `string`

**Note**

You cannot use any cross-validation name-value argument together with the
`'OptimizeHyperparameters'`

name-value argument. You can modify the
cross-validation for `'OptimizeHyperparameters'`

only by using the
`'HyperparameterOptimizationOptions'`

name-value argument.

**Cross-Validation Options**

`CrossVal`

— Flag to train cross-validated model

`'off'`

(default) | `'on'`

Flag to train a cross-validated model, specified as `'on'`

or `'off'`

.

If you specify `'on'`

, then the software trains a
cross-validated model with 10 folds.

You can override this cross-validation setting using the
`'CVPartition'`

, `'Holdout'`

,
`'KFold'`

, or `'Leaveout'`

name-value argument. You can use only one cross-validation name-value
argument at a time to create a cross-validated model.

Alternatively, cross-validate after creating a model by passing
`Mdl`

to `crossval`

.

**Example: **`'Crossval','on'`

`CVPartition`

— Cross-validation partition

`[]`

(default) | `cvpartition`

partition object

Cross-validation partition, specified as a `cvpartition`

partition object
created by `cvpartition`

. The partition object
specifies the type of cross-validation and the indexing for the training and validation
sets.

To create a cross-validated model, you can specify only one of these four name-value
arguments: `CVPartition`

, `Holdout`

,
`KFold`

, or `Leaveout`

.

**Example: **Suppose you create a random partition for 5-fold cross-validation on 500
observations by using `cvp = cvpartition(500,'KFold',5)`

. Then, you can
specify the cross-validated model by using
`'CVPartition',cvp`

.

`Holdout`

— Fraction of data for holdout validation

scalar value in the range (0,1)

Fraction of the data used for holdout validation, specified as a scalar value in the range
(0,1). If you specify `'Holdout',p`

, then the software completes these
steps:

Randomly select and reserve

`p*100`

% of the data as validation data, and train the model using the rest of the data.Store the compact, trained model in the

`Trained`

property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value
arguments: `CVPartition`

, `Holdout`

,
`KFold`

, or `Leaveout`

.

**Example: **`'Holdout',0.1`

**Data Types: **`double`

| `single`

`KFold`

— Number of folds

`10`

(default) | positive integer value greater than 1

Number of folds to use in a cross-validated model, specified as a positive integer value
greater than 1. If you specify `'KFold',k`

, then the software completes
these steps:

Randomly partition the data into

`k`

sets.For each set, reserve the set as validation data, and train the model using the other

`k`

– 1 sets.Store the

`k`

compact, trained models in a`k`

-by-1 cell vector in the`Trained`

property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value
arguments: `CVPartition`

, `Holdout`

,
`KFold`

, or `Leaveout`

.

**Example: **`'KFold',5`

**Data Types: **`single`

| `double`

`Leaveout`

— Leave-one-out cross-validation flag

`'off'`

(default) | `'on'`

Leave-one-out cross-validation flag, specified as `'on'`

or
`'off'`

. If you specify `'Leaveout','on'`

, then
for each of the *n* observations (where *n* is the
number of observations, excluding missing observations, specified in the
`NumObservations`

property of the model), the software completes
these steps:

Reserve the one observation as validation data, and train the model using the other

*n*– 1 observations.Store the

*n*compact, trained models in an*n*-by-1 cell vector in the`Trained`

property of the cross-validated model.

`CVPartition`

, `Holdout`

,
`KFold`

, or `Leaveout`

.

**Example: **`'Leaveout','on'`

**Hyperparameter Optimization Options**

`OptimizeHyperparameters`

— Parameters to optimize

`'none'`

(default) | `'auto'`

| `'auto-univariate'`

| `'auto-bivariate'`

| `'all'`

| `'all-univariate'`

| `'all-bivariate'`

| string array or cell array of eligible parameter names | vector of `optimizableVariable`

objects

Parameters to optimize, specified as one of these values:

`'none'`

— Do not optimize.`'auto'`

— Optimize`InitialLearnRateForPredictors`

,`NumTreesPerPredictor`

,`Interactions`

,`InitialLearnRateForInteractions`

, and`NumTreesPerInteraction`

.`'auto-univariate'`

— Optimize`InitialLearnRateForPredictors`

and`NumTreesPerPredictor`

.`'auto-bivariate'`

— Optimize`Interactions`

,`InitialLearnRateForInteractions`

, and`NumTreesPerInteraction`

.`'all'`

— Optimize all eligible parameters.`'all-univariate'`

— Optimize all eligible univariate parameters.`'all-bivariate'`

— Optimize all eligible bivariate parameters.String array or cell array of eligible parameter names.

Vector of

`optimizableVariable`

objects, typically the output of`hyperparameters`

.

The eligible parameters for `fitrgam`

are:

Univariate hyperparameters

`InitialLearnRateForPredictors`

—`fitrgam`

searches among real values, log-scaled in the range`[1e-3,1]`

.`MaxNumSplitsPerPredictor`

—`fitrgam`

searches among integers in the range`[1,maxNumSplits]`

, where`maxNumSplits`

is`min(30,max(2,NumObservations–1))`

.`NumObservations`

is the number of observations, excluding missing observations, stored in the`NumObservations`

property of the returned model`Mdl`

.`NumTreesPerPredictor`

—`fitrgam`

searches among integers, log-scaled in the range`[10,500]`

.

Bivariate hyperparameters

`Interactions`

—`fitrgam`

searches among integers, log-scaled in the range`[0,MaxNumInteractions]`

t, where`MaxNumInteractions`

is`NumPredictors*(NumPredictors – 1)/2`

, and`NumPredictors`

is the number of predictors used to train the model.`InitialLearnRateForInteractions`

—`fitrgam`

searches among real values, log-scaled in the range`[1e-3,1]`

.`MaxNumSplitsPerInteraction`

—`fitrgam`

searches among integers in the range`[1,maxNumSplits]`

.`NumTreesPerInteraction`

—`fitrgam`

searches among integers, log-scaled in the range`[10,500]`

.

Use `'auto'`

or `'all'`

to find optimal
hyperparameter values for both univariate and bivariate parameters. Alternatively, you
can find optimal values for univariate parameters using
`'auto-univariate'`

or `'all-univariate'`

and then
find optimal values for bivariate parameters using `'auto-bivariate'`

or `'all-bivariate'`

. For examples, see Optimize GAM Using OptimizeHyperparameters and Train Generalized Additive Model for Regression.

The optimization attempts to minimize the cross-validation loss (error) for
`fitrgam`

by varying the parameters. To control the
cross-validation type and other aspects of the optimization, use the
`HyperparameterOptimizationOptions`

name-value argument.

**Note**

The values of `'OptimizeHyperparameters'`

override any values you specify
using other name-value arguments. For example, setting
`'OptimizeHyperparameters'`

to `'auto'`

causes
`fitrgam`

to optimize hyperparameters corresponding to the
`'auto'`

option and to ignore any specified values for the
hyperparameters.

Set nondefault parameters by passing a vector of
`optimizableVariable`

objects that have nondefault values. For
example:

load carsmall params = hyperparameters('fitrgam',[Horsepower,Weight],MPG); params(1).Range = [1e-4,1e6];

Pass `params`

as the value of
`OptimizeHyperparameters`

.

By default, the iterative display appears at the command line,
and plots appear according to the number of hyperparameters in the optimization. For the
optimization and plots, the objective function is log(1 + cross-validation loss). To control the iterative display, set the `Verbose`

field of
the `'HyperparameterOptimizationOptions'`

name-value argument. To control the
plots, set the `ShowPlots`

field of the
`'HyperparameterOptimizationOptions'`

name-value argument.

**Example: **`'OptimizeHyperparameters','auto'`

`HyperparameterOptimizationOptions`

— Options for optimization

structure

Options for optimization, specified as a structure. This argument modifies the effect of the
`OptimizeHyperparameters`

name-value argument. All fields in the
structure are optional.

Field Name | Values | Default |
---|---|---|

`Optimizer` | `'bayesopt'` — Use Bayesian optimization. Internally, this setting calls`bayesopt` .`'gridsearch'` — Use grid search with`NumGridDivisions` values per dimension.`'randomsearch'` — Search at random among`MaxObjectiveEvaluations` points.
| `'bayesopt'` |

`AcquisitionFunctionName` |
`'expected-improvement-per-second-plus'` `'expected-improvement'` `'expected-improvement-plus'` `'expected-improvement-per-second'` `'lower-confidence-bound'` `'probability-of-improvement'`
Acquisition functions whose names include
| `'expected-improvement-per-second-plus'` |

`MaxObjectiveEvaluations` | Maximum number of objective function evaluations. | `30` for `'bayesopt'` and
`'randomsearch'` , and the entire grid for
`'gridsearch'` |

`MaxTime` | Time limit, specified as a positive real scalar. The time limit is in seconds, as
measured by | `Inf` |

`NumGridDivisions` | For `'gridsearch'` , the number of values in each dimension. The value can be
a vector of positive integers giving the number of
values for each dimension, or a scalar that
applies to all dimensions. This field is ignored
for categorical variables. | `10` |

`ShowPlots` | Logical value indicating whether to show plots. If `true` , this field plots
the best observed objective function value against the iteration number. If you
use Bayesian optimization (`Optimizer` is
`'bayesopt'` ), then this field also plots the best
estimated objective function value. The best observed objective function values
and best estimated objective function values correspond to the values in the
`BestSoFar (observed)` and ```
BestSoFar
(estim.)
``` columns of the iterative display, respectively. You can
find these values in the properties `ObjectiveMinimumTrace` and `EstimatedObjectiveMinimumTrace` of
`Mdl.HyperparameterOptimizationResults` . If the problem
includes one or two optimization parameters for Bayesian optimization, then
`ShowPlots` also plots a model of the objective function
against the parameters. | `true` |

`SaveIntermediateResults` | Logical value indicating whether to save results when `Optimizer` is
`'bayesopt'` . If
`true` , this field overwrites a
workspace variable named
`'BayesoptResults'` at each
iteration. The variable is a `BayesianOptimization` object. | `false` |

`Verbose` | Display at the command line: `0` — No iterative display`1` — Iterative display`2` — Iterative display with extra information
For details, see the | `1` |

`UseParallel` | Logical value indicating whether to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see Parallel Bayesian Optimization. | `false` |

`Repartition` | Logical value indicating whether to repartition the cross-validation at every
iteration. If this field is The setting
| `false` |

Use no more than one of the following three options. | ||

`CVPartition` | A `cvpartition` object, as created by `cvpartition` | `'Kfold',5` if you do not specify a cross-validation
field |

`Holdout` | A scalar in the range `(0,1)` representing the holdout fraction | |

`Kfold` | An integer greater than 1 |

**Example: **`'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60)`

**Data Types: **`struct`

## Output Arguments

`Mdl`

— Trained generalized additive model

`RegressionGAM`

model object | `RegressionPartitionedGAM`

cross-validated model object

Trained generalized additive model, returned as one of the model objects in this table.

Model Object | Cross-Validation Options to Train Model Object | Ways to Predict Responses Using Model Object |
---|---|---|

`RegressionGAM` | None | Use `predict` to predict responses for new observations, and use
`resubPredict` to predict responses for training
observations. |

`RegressionPartitionedGAM` | Specify the name-value argument `KFold` ,
`Holdout` , `Leaveout` ,
`CrossVal` , or `CVPartition` | Use `kfoldPredict` to predict responses
for observations that `fitrgam` holds out during training.
`kfoldPredict` predicts a response for every observation
by using the model trained without that observation. |

To reference properties of `Mdl`

, use dot notation. For example,
enter `Mdl.Interactions`

in the Command Window to display the
interaction terms in `Mdl`

.

## More About

### Generalized Additive Model (GAM) for Regression

A generalized additive model (GAM) is an interpretable model that explains a response variable using a sum of univariate and bivariate shape functions of predictors.

`fitrgam`

uses a boosted tree as a shape function for each predictor and, optionally, each pair of predictors; therefore, the function can capture a nonlinear relation between a predictor and the response variable. Because contributions of individual shape functions to the prediction (response value) are well separated, the model is easy to interpret.

The standard GAM uses a univariate shape function for each predictor.

$$\begin{array}{l}y~N\left(\mu ,{\sigma}^{2}\right)\\ g(\mu )=\mu =c+\text{}{f}_{1}({x}_{1})+\text{}{f}_{2}({x}_{2})+\cdots +{f}_{p}({x}_{p}),\end{array}$$

where *y* is a response variable that follows the normal distribution with mean *μ* and standard deviation *σ*. *g*(*μ*) is an identity link function, and *c* is an intercept (constant) term. *f _{i}*(

*x*) is a univariate shape function for the

_{i}*i*th predictor, which is a boosted tree for a linear term for the predictor (predictor tree).

You can include interactions between predictors in a model by adding bivariate shape functions of important interaction terms to the model.

$$\mu =c+\text{}{f}_{1}({x}_{1})+\text{}{f}_{2}({x}_{2})+\cdots +{f}_{p}({x}_{p})+{\displaystyle \sum _{i,j\in \{1,2,\cdots ,p\}}{f}_{ij}({x}_{i}{x}_{j})},$$

where *f _{ij}*(

*x*

_{i}*x*) is a bivariate shape function for the

_{j}*i*th and

*j*th predictors, which is a boosted tree for an interaction term for the predictors (interaction tree).

`fitrgam`

finds important interaction terms based on the *p*-values of *F*-tests. For details, see Interaction Term Detection.

If you specify `'FitStandardDeviation'`

of `fitrgam`

as
`false`

(default), then `fitrgam`

trains a model for
the mean *μ*. If you specify `'FitStandardDeviation'`

as
`true`

, then `fitrgam`

trains an additional model
for the standard deviation *σ* and sets the
`IsStandardDeviationFit`

property of the GAM object to
`true`

.

### Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to the saturated model.

The deviance of a fitted model is twice the difference between the loglikelihoods of the model and the saturated model:

-2(log*L* -
log*L _{s}*),

where *L* and
*L _{s}* are the likelihoods of the fitted model and
the saturated model, respectively. The saturated model is the model with the maximum number
of parameters that you can estimate.

`fitrgam`

uses the deviance to measure the goodness of model fit
and finds a learning rate that reduces the deviance at each iteration. Specify
`'Verbose'`

as 1 or 2 to display the deviance and learning rate in
the Command Window.

## Algorithms

### Gradient Boosting Algorithm

`fitrgam`

fits a generalized additive model using a gradient
boosting algorithm (Least-Squares Boosting).

`fitrgam`

first builds sets of predictor trees (boosted trees for
linear terms for predictors) and then builds sets of interaction trees (boosted trees for
interaction terms for predictors). The boosting algorithm iterates for at most
`'NumTreesPerPredictor'`

times for predictor trees, and then iterates
for at most `'NumTreesPerInteraction'`

times for interaction
trees.

For each boosting iteration, `fitrgam`

builds a set of predictor
trees with the learning rate `'InitialLearnRateForPredictors'`

, or builds
a set of interaction trees with the learning rate
`'InitialLearnRateForInteractions'`

.

When building a set of trees, the function trains one tree at a time. It fits a tree to the residual that is the difference between the response and the aggregated prediction from all trees grown previously. To control the boosting learning speed, the function shrinks the tree by the learning rate and then adds the tree to the model and updates the residual.

Updated model = current model + (learning rate)·(new tree)

Updated residual = current residual – (learning rate)·(response explained by new tree)

If adding the set of trees improves the model fit (that is, reduces the deviance of the fit by a value larger than the tolerance), then

`fitrgam`

moves to the next iteration.If adding the set of trees does not improve the model fit when

`fitrgam`

trains linear terms, then the function stops boosting iterations for linear terms and starts boosting iterations for interaction terms. If the model fit is not improved when the function trains interaction terms, then the function terminates the model fitting.You can determine why training stopped by checking the

`ReasonForTermination`

property of the trained model.

### Interaction Term Detection

For each pairwise interaction term
*x _{i}*

*x*(specified by

_{j}`formula`

or `'Interactions'`

), the
software performs an *F*-test to examine whether the term is statistically significant.

To speed up the process, `fitrgam`

bins numeric predictors into at
most 8 equiprobable bins. The number of bins can be less than 8 if a predictor has fewer
than 8 unique values. The *F*-test examines the null hypothesis that the
bins created by *x _{i}* and

*x*have equal responses versus the alternative that at least one bin has a different response value from the others. A small

_{j}*p*-value indicates that differences are significant, which implies that the corresponding interaction term is significant and, therefore, including the term can improve the model fit.

`fitrgam`

builds a set of interaction trees using the terms whose
*p*-values are not greater than the `'MaxPValue'`

value. You can use the default `'MaxPValue'`

value `1`

to build interaction trees using all terms specified by `formula`

or
`'Interactions'`

.

`fitrgam`

adds interaction terms to the model in the order of
importance based on the *p*-values. Use the
`Interactions`

property of the returned model to check the order of
the interaction terms added to the model.

## References

[1] Lou, Yin, Rich Caruana, and Johannes Gehrke. "Intelligible Models for Classification and Regression." *Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’12).* Beijing, China: ACM Press, 2012, pp. 150–158.

[2] Lou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. "Accurate Intelligible Models with Pairwise Interactions." *Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13)* Chicago, Illinois, USA: ACM Press, 2013, pp. 623–631.

## Extended Capabilities

### Automatic Parallel Support

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To perform parallel hyperparameter optimization, use the
`'HyperparameterOptimizationOptions', struct('UseParallel',true)`

name-value argument in the call to the `fitrgam`

function.

For more information on parallel hyperparameter optimization, see Parallel Bayesian Optimization.

For general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

## Version History

**Introduced in R2021a**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)