# templateGAM

## Description

returns a generalized additive
learner template suitable for training a classification or regression model.`t`

= templateGAM

returns a template with additional options specified by one or more name-value arguments.
For example, you can specify the number of trees per linear term or the number of trees per
interaction term.`t`

= templateGAM(`Name=Value`

)

If you specify the type of model by using the `Type`

name-value
argument, then the display of `t`

in the Command Window shows all options
as empty (`[]`

), except those that you specify using name-value arguments.
If you do not specify the type of model, then the display suppresses the empty options.
During training, the software uses default values for empty options.

## Examples

**Create GAM Template for Classification**

Create a template for a GAM classifier.

`t = templateGAM(Type="classification")`

t = Fit template for classification GAM. NumPrint: [] MaxPValue: [] InitialLearnRateForPredictors: [] InitialLearnRateForInteractions: [] NumTreesPerPredictor: [] NumTreesPerInteraction: [] MaxNumSplitsPerPredictor: [] MaxNumSplitsPerInteraction: [] VerbosityLevel: [] Interactions: [] Version: 1 Method: 'GAM' Type: 'classification'

`t`

is a template object for a GAM learner. All properties of the template object are empty except `Method`

and `Type`

. When you pass `t`

to a training function, the software sets the empty properties to their respective default values.

## Input Arguments

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`t=templateGAM(Type="regression")`

creates a GAM learner
template for regression.

**GAM Classification and Regression Options**

`InitialLearnRateForInteractions`

— Initial learning rate of gradient boosting for interaction terms

`1`

(default) | numeric scalar in (0,1]

Initial learning rate of gradient boosting for interaction terms, specified as a numeric scalar in the interval (0,1].

For each boosting iteration for interaction trees, the software starts fitting with the initial learning rate. The function halves the learning rate until it finds a rate that improves the model fit.

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy.

For more details about gradient boosting, see Gradient Boosting Algorithm.

**Example: **`InitialLearnRateForInteractions=0.1`

**Data Types: **`single`

| `double`

`InitialLearnRateForPredictors`

— Initial learning rate of gradient boosting for linear terms

`1`

(default) | numeric scalar in (0,1]

Initial learning rate of gradient boosting for linear terms, specified as a numeric scalar in the interval (0,1].

For each boosting iteration for predictor trees, the software starts fitting with the initial learning rate. The function halves the learning rate until it finds a rate that improves the model fit.

Training a model using a small learning rate requires more learning iterations, but often achieves better accuracy.

For more details about gradient boosting, see Gradient Boosting Algorithm.

**Example: **`InitialLearnRateForPredictors=0.1`

**Data Types: **`single`

| `double`

`Interactions`

— Number or list of the interaction terms

`0`

(default) | nonnegative integer scalar | logical matrix | `"all"`

Number or list of interaction terms to include in the candidate set
*S*, specified as a nonnegative integer scalar, a logical matrix,
or `"all"`

.

Number of interaction terms, specified as a nonnegative integer —

*S*includes the specified number of important interaction terms, selected based on the*p*-values of the terms.List of interaction terms, specified as a logical matrix —

*S*includes the terms specified by a`t`

-by-`p`

logical matrix, where`t`

is the number of interaction terms, and`p`

is the number of predictors used to train the model. For example,`logical([1 1 0; 0 1 1])`

represents two pairs of interaction terms: a pair of the first and second predictors, and a pair of the second and third predictors.If the software uses a subset of input variables as predictors, then the function indexes the predictors using only the subset. That is, the column indexes of the logical matrix do not count the response and observation weight variables. The indexes also do not count any variables not used by the function.

`"all"`

—*S*includes all possible pairs of interaction terms, which is`p*(p – 1)/2`

number of terms in total.

Among the interaction terms in *S*, the software identifies
those whose *p*-values are not greater than the
`MaxPValue`

value and uses them to build a set of interaction
trees. Use the default value (`MaxPValue=1`

) to build interaction
trees using all terms in *S*.

**Example: **`Interactions="all"`

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

`MaxNumSplitsPerInteraction`

— Maximum number of decision splits per interaction tree

4 (default) | positive integer scalar

Maximum number of decision splits (or branch nodes) per interaction tree (boosted tree for an interaction term), specified as a positive integer scalar.

**Example: **`MaxNumSplitsPerInteraction=5`

**Data Types: **`single`

| `double`

`MaxNumSplitsPerPredictor`

— Maximum number of decision splits per predictor tree

1 (default) | positive integer scalar

Maximum number of decision splits (or branch nodes) per predictor tree (boosted tree for a linear term), specified as a positive integer scalar. By default, the software uses a tree stump for a predictor tree.

**Example: **`MaxNumSplitsPerPredictor=5`

**Data Types: **`single`

| `double`

`MaxPValue`

— Maximum *p*-value for detecting interaction terms

1 (default) | numeric scalar in [0,1]

Maximum *p*-value for detecting interaction terms, specified as a
numeric scalar in the interval [0,1].

The software first finds the candidate set *S* of interaction
terms from `Interactions`

. Then the function identifies the
interaction terms whose *p*-values are not greater than the
`MaxPValue`

value and uses them to build a set of interaction
trees.

The default value (`MaxPValue=1`

) builds interaction trees for
all interaction terms in the candidate set *S*.

For more details about detecting interaction terms, see Interaction Term Detection.

**Example: **`MaxPValue=0.05`

**Data Types: **`single`

| `double`

`NumTreesPerInteraction`

— Number of trees per interaction term

100 (default) | positive integer scalar

Number of trees per interaction term, specified as a positive integer scalar.

The `NumTreesPerInteraction`

value is equivalent to the number
of gradient boosting iterations for the interaction terms for predictors. For each
iteration, the software adds a set of interaction trees to the model, one tree for
each interaction term. To learn about the gradient boosting algorithm, see Gradient Boosting Algorithm.

**Example: **`NumTreesPerInteraction=500`

**Data Types: **`single`

| `double`

`NumTreesPerPredictor`

— Number of trees per linear term

300 (default) | positive integer scalar

Number of trees per linear term, specified as a positive integer scalar.

The `NumTreesPerPredictor`

value is equivalent to the number of
gradient boosting iterations for the linear terms for predictors. For each iteration,
the software adds a set of predictor trees to the model, one tree for each predictor.
To learn about the gradient boosting algorithm, see Gradient Boosting Algorithm.

**Example: **`NumTreesPerPredictor=500`

**Data Types: **`single`

| `double`

`Type`

— GAM model type

`"classification"`

| `"regression"`

GAM model type, specified as `"classification"`

or
`"regression"`

.

Value | Description |
---|---|

`"classification"` | Create a classification GAM learner template. If you do not specify
`Type` as `"classification"` , the
fitting function `testckfold` sets this value when
you pass `t` to the function. |

`"regression"` | Create a regression GAM learner template. If you do not specify
`Type` as `"regression"` , the fitting
function `directforecaster` sets this value when you pass
`t` to the function. |

**Example: **`Type="classification"`

**Data Types: **`char`

| `string`

**Other Classification and Regression Options**

`NumPrint`

— Number of iterations between diagnostic message printouts

`10`

(default) | nonnegative integer scalar

Number of iterations between diagnostic message printouts, specified as a
nonnegative integer scalar. This argument is valid only when you specify
`Verbose`

as 1.

If you specify `Verbose=1`

and
`NumPrint=numPrint`

, then the software displays diagnostic
messages every `numPrint`

iterations in the Command Window.

**Example: **`NumPrint=500`

**Data Types: **`single`

| `double`

`Verbose`

— Verbosity level

`0`

(default) | `1`

| `2`

Verbosity level, specified as `0`

, `1`

, or
`2`

. The `Verbose`

value controls the amount
of diagnostic information that the software displays in the Command Window.

Value | Description |
---|---|

`0` | The software displays no information. |

`1` | The software displays diagnostic messages every
`numPrint` iterations, where `numPrint`
is the `NumPrint` value. |

`2` | The software displays diagnostic messages at every iteration. |

Each line of the diagnostic messages shows the information about each boosting iteration and includes the following columns:

`Type`

— Type of trained trees,`1D`

(predictor trees, or boosted trees for linear terms for predictors) or`2D`

(interaction trees, or boosted trees for interaction terms for predictors)`NumTrees`

— Number of trees per linear term or interaction term added by`templateGAM`

to the model so far`Deviance`

— Deviance of the model`RelTol`

— Relative change of model predictions: $${\left({\widehat{y}}_{k}-{\widehat{y}}_{k-1}\right)}^{\prime}\left({\widehat{y}}_{k}-{\widehat{y}}_{k-1}\right)/{\widehat{y}}_{k}{}^{\prime}{\widehat{y}}_{k}$$, where $${\widehat{y}}_{k}$$ is a column vector of model predictions at iteration*k*`LearnRate`

— Learning rate used for the current iteration

**Example: **`Verbose=1`

**Data Types: **`single`

| `double`

## Output Arguments

`t`

— GAM learner template

template object

GAM learner template suitable for training GAM classification or regression models, returned as a template object. During training, the software uses default values for empty options.

## More About

### Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to the saturated model.

The deviance of a fitted model is twice the difference between the loglikelihoods of the model and the saturated model

-2(log*L* -
log*L _{s}*),

where *L* and
*L _{s}* are the likelihoods of the fitted model and
the saturated model, respectively. The saturated model is the model with the maximum number
of parameters that you can estimate.

The software uses the deviance to measure the goodness of fit for the model, and finds a
learning rate that reduces the deviance at each iteration. Specify
`Verbose`

as 1 or 2 to display the deviance and learning rate in the
Command Window.

## Algorithms

### Gradient Boosting Algorithm

The software fits a generalized additive model (GAM) using a gradient boosting algorithm (Adaptive Logistic Regression).

The software first builds sets of predictor trees (boosted trees for linear terms for
predictors) and then builds sets of interaction trees (boosted trees for interaction terms
for predictors). The boosting algorithm iterates for at most
`NumTreesPerPredictor`

times for predictor trees, and then iterates for
at most `NumTreesPerInteraction`

times for interaction trees.

For each boosting iteration, the software builds a set of predictor trees with the
initial learning rate `InitialLearnRateForPredictors`

, or builds a set of
interaction trees with the initial learning rate
`InitialLearnRateForInteractions`

.

When building a set of trees, the function trains one tree at a time. The function fits a tree to the residual that is the difference between the response and the aggregated prediction from all trees grown previously. To control the boosting learning speed, the function shrinks the tree by the learning rate, and then adds the tree to the model and updates the residual.

Updated model = current model + (learning rate)·(new tree)

Updated residual = current residual – (learning rate)·(response explained by new tree)

If adding the set of trees improves the model fit (that is, reduces the deviance of the fit by a value larger than a tolerance), then the software moves to the next iteration.

Otherwise, the software halves the learning rate and uses it to update the model and residual. The function continues to halve the learning rate until it finds a rate that improves the model fit.

If the function cannot find such a learning rate when training predictor trees, then it stops boosting iterations for linear terms and starts boosting iterations for interaction terms.

If the function cannot find such a learning rate when training interaction trees, then it terminates the model fitting.

You can determine why training stopped by checking the

`ReasonForTermination`

property of the trained model.

### Interaction Term Detection

For each pairwise interaction term
*x _{i}*

*x*(specified by the

_{j}`Interactions`

name-value argument), the software
performs an *F*-test to examine whether the term is statistically significant.

To speed up the process, the software bins numeric predictors into at most 8
equiprobable bins. The number of bins can be less than 8 if a predictor has fewer than 8
unique values. The *F*-test examines the null hypothesis that the bins
created by *x _{i}* and

*x*have equal responses versus the alternative that at least one bin has a different response value from the others. A small

_{j}*p*-value indicates that differences are significant, which implies that the corresponding interaction term is significant and, therefore, including the term can improve the model fit.

The software builds a set of interaction trees using the terms whose
*p*-values are not greater than the `MaxPValue`

value.
You can use the default `MaxPValue`

value `1`

to build
interaction trees using all terms specified by `Interactions`

.

The software adds interaction terms to the model in the order of importance based on the
*p*-values. Use the `Interactions`

property of the
returned model to check the order of the interaction terms added to the model.

## Version History

**Introduced in R2023b**

## See Also

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)