Generalized linear regression model class

An object comprising training data, model description, diagnostic
information, and fitted coefficients for a generalized linear regression.
Predict model responses with the `predict`

or `feval`

methods.

or `mdl`

=
fitglm(`tbl`

)

creates
a generalized linear model of a table or dataset array `mdl`

=
fitglm(`X`

,`y`

)`tbl`

,
or of the responses `y`

to a data matrix `X`

.
For details, see `fitglm`

.

or `mdl`

= stepwiseglm(`tbl`

)

creates
a generalized linear model of a table or dataset array `mdl`

=
stepwiseglm(`X`

,`y`

)`tbl`

,
or of the responses `y`

to a data matrix `X`

,
with unimportant predictors excluded. For details, see `stepwiseglm`

.

addTerms | Add terms to generalized linear model |

coefCI | Confidence intervals of coefficient estimates of generalized linear model |

coefTest | Linear hypothesis test on generalized linear regression model coefficients |

devianceTest | Analysis of deviance |

disp | Display generalized linear regression model |

feval | Evaluate generalized linear regression model prediction |

fit | Create generalized linear regression model |

plotDiagnostics | Plot diagnostics of generalized linear regression model |

plotResiduals | Plot residuals of generalized linear regression model |

plotSlice | Plot of slices through fitted generalized linear regression surface |

predict | Predict response of generalized linear regression model |

random | Simulate responses for generalized linear regression model |

removeTerms | Remove terms from generalized linear model |

step | Improve generalized linear regression model by adding or removing terms |

stepwise | Create generalized linear regression model by stepwise regression |

The default link function for a generalized linear model is
the *canonical link function*.

**Canonical Link Functions for Generalized Linear Models**

Distribution | Link Function Name | Link Function | Mean (Inverse) Function |
---|---|---|---|

`'normal'` | `'identity'` | f(μ) = μ | μ = Xb |

`'binomial'` | `'logit'` | f(μ) = log(μ/(1–μ)) | μ = exp(Xb) / (1
+ exp(Xb)) |

`'poisson'` | `'log'` | f(μ) = log(μ) | μ = exp(Xb) |

`'gamma'` | `-1` | f(μ) = 1/μ | μ = 1/(Xb) |

`'inverse gaussian'` | `-2` | f(μ) = 1/μ^{2} | μ = (Xb)^{–1/2} |

The *hat matrix* *H* is
defined in terms of the data matrix *X* and a diagonal
weight matrix *W*:

*H* = *X*(*X ^{T}WX*)

*W* has diagonal elements *w _{i}*:

$${w}_{i}=\frac{{g}^{\prime}\left({\mu}_{i}\right)}{\sqrt{V\left({\mu}_{i}\right)}},$$

where

*g*is the link function mapping*y*to_{i}*x*._{i}b$${g}^{\prime}$$ is the derivative of the link function

*g*.*V*is the variance function.*μ*is the_{i}*i*th mean.

The diagonal elements *H _{ii}* satisfy

$$\begin{array}{l}0\le {h}_{ii}\le 1\\ {\displaystyle \sum _{i=1}^{n}{h}_{ii}}=p,\end{array}$$

where *n* is the number of observations (rows
of *X*), and *p* is the number of
coefficients in the regression model.

The *leverage* of observation *i* is
the value of the *i*th diagonal term, *h*_{ii},
of the hat matrix *H*. Because the sum of the leverage
values is *p* (the number of coefficients in the
regression model), an observation *i* can be considered
to be an outlier if its leverage substantially exceeds *p*/*n*,
where *n* is the number of observations.

The Cook's distance *D _{i}* of
observation

$${D}_{i}={w}_{i}\frac{{e}_{i}^{2}}{p\widehat{\phi}}\frac{{h}_{ii}}{{\left(1-{h}_{ii}\right)}^{2}},$$

where

$$\widehat{\phi}$$ is the dispersion parameter (estimated or theoretical).

*e*is the linear predictor residual, $$g\left({y}_{i}\right)-{x}_{i}\widehat{\beta}$$, where_{i}*g*is the link function.*y*is the observed response._{i}*x*is the observation._{i}$$\widehat{\beta}$$ is the estimated coefficient vector.

*p*is the number of coefficients in the regression model.*h*is the_{ii}*i*th diagonal element of the Hat Matrix*H*.

Deviance of a model M_{1} is twice the difference
between the loglikelihood of that model and the saturated model, M_{S}.
The saturated model is the model with the maximum number of parameters
that can be estimated. For example, if there are *n* observations *y*_{i}, *i* =
1, 2, ..., *n*, with potentially different values
for *X*_{i}^{T}β,
then you can define a saturated model with *n* parameters.
Let L(*b*,*y*) denote the maximum
value of the likelihood function for a model. Then the deviance of
model M_{1} is

$$-2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right),$$

where *b*_{1} are the estimated
parameters for model M_{1} and *b*_{S} are
the estimated parameters for the saturated model. The deviance has
a chi-square distribution with *n* – *p* degrees
of freedom, where *n* is the number of parameters
in the saturated model and *p* is the number of parameters
in model M_{1}.

If M_{1} and M_{2} are
two different generalized linear models, then the fit of the models
can be assessed by comparing the deviances *D*_{1} and *D*_{2} of
these models. The difference of the deviances is

$$\begin{array}{l}D={D}_{2}-{D}_{1}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$$

Asymptotically, this difference has a chi-square distribution
with degrees of freedom *v* equal to the number of
parameters that are estimated in one model but fixed (typically at
0) in the other. That is, it is equal to the difference in the number
of parameters estimated in M_{1} and M_{2}.
You can get the *p*-value for this test using `1 - chi2cdf(D,V)`

, where *D* = *D*_{2} – *D*_{1}.

Value. To learn how value classes affect
copy operations, see Copying Objects in
the MATLAB^{®} documentation.

Was this topic helpful?