Generalized linear regression model class
An object comprising training data, model description, diagnostic
information, and fitted coefficients for a generalized linear regression.
Predict model responses with the predict
or feval
methods.
or mdl
=
fitglm(tbl
)
creates
a generalized linear model of a table or dataset array mdl
=
fitglm(X
,y
)tbl
,
or of the responses y
to a data matrix X
.
For details, see fitglm
.
or mdl
= stepwiseglm(tbl
)
creates
a generalized linear model of a table or dataset array mdl
=
stepwiseglm(X
,y
)tbl
,
or of the responses y
to a data matrix X
,
with unimportant predictors excluded. For details, see stepwiseglm
.

Covariance matrix of coefficient estimates.  

Cell array of strings containing a label for each coefficient.  

Coefficient values stored as a table.
To obtain any of these columns as a vector, index into the property
using dot notation. For example, in beta = mdl.Coefficients.Estimate Use  

Deviance of the fit. It is useful for comparing two models when one is a special case of the other. The difference between the deviance of the two models has a chisquare distribution with degrees of freedom equal to the difference in the number of estimated parameters between the two models. For more information on deviance, see Deviance.  

Degrees of freedom for error (residuals), equal to the number of observations minus the number of estimated coefficients.  

Table with diagnostics helpful in finding outliers and influential observations. The table contains the following fields:
All of these quantities are computed on the scale of the linear predictor. So, for example, in the equation that defines the hat matrix, Yfit = glm.Fitted.LinearPredictor Y = glm.Fitted.LinearPredictor + glm.Residuals.LinearPredictor  

Scale factor of the variance of the response. For example, the variance function for the binomial distribution
is p(1–p)/n,
where p is the probability parameter and n is
the sample size parameter. If  

Logical value indicating whether
 

Structure with the following fields relating to the generalized distribution:
 

Table of predicted (fitted) values based on the training data, a table with one row for each observation and the following columns.
To obtain any of the columns as a vector, index into the property
using dot notation. For example, in the model f = mdl.Fitted.Response Use  

Object containing information about the model.  

Structure with fields relating to the link function. The link is a function f that links the distribution parameter μ to the fitted linear combination Xb of the predictors: f(μ) = Xb. The structure has the following fields.
 

Log likelihood of the model distribution at the response values, with mean fitted from the model, and other parameters estimated as part of the model fit.  

To obtain any of these values as a scalar, index into the property
using dot notation. For example, in a model aic = mdl.ModelCriterion.AIC  

Number of coefficients in the model, a positive integer.  

Number of estimated coefficients in the model, a positive integer.  

Number of observations the fitting function used in fitting.
This is the number of observations supplied in the original table,
dataset, or matrix, minus any excluded rows (set with the  

Number of variables  

Number of variables in the data.  

Table with the same number of rows as the input data (
 

Cell array of strings containing the names of the observations used in the fit.
 

Vector with the same length as the number of rows in the data,
passed from μ with the For example, consider a Poisson regression model. Suppose the
number of counts is known for theoretical reasons to be proportional
to a predictor  

Cell array of strings, the names of the predictors used in fitting the model.  

Table containing residuals, with one row for each observation and these variables.
To obtain any of these columns as a vector, index into the property
using dot notation. For example, in a model r = mdl.Residuals.Raw Rows not used in the fit because of missing values (in Rows not used in the fit because of excluded values (in
 

String giving naming the response variable.  

Proportion of total sum of squares explained by the model. The
ordinary Rsquared value relates to the
For a linear or nonlinear model,
For a generalized linear model,
To obtain any of these values as a scalar, index into the property
using dot notation. For example, the adjusted Rsquared value in r2 = mdl.Rsquared.Adjusted  

Sum of squared errors (residuals). The Pythagorean theorem implies
 

Regression sum of squares, the sum of squared deviations of the fitted values from their mean. The Pythagorean theorem implies
 

Total sum of squares, the sum of squared deviations of The Pythagorean theorem implies
 

Structure that is empty unless
The
 

Table containing metadata about
 

Cell array of strings containing names of the variables in the fit.
 

Table containing the data, both observations and responses,
that the fitting function used to construct the fit. If the fit is
based on a table or dataset array, 
addTerms  Add terms to generalized linear model 
coefCI  Confidence intervals of coefficient estimates of generalized linear model 
coefTest  Linear hypothesis test on generalized linear regression model coefficients 
devianceTest  Analysis of deviance 
disp  Display generalized linear regression model 
feval  Evaluate generalized linear regression model prediction 
fit  Create generalized linear regression model 
plotDiagnostics  Plot diagnostics of generalized linear regression model 
plotResiduals  Plot residuals of generalized linear regression model 
plotSlice  Plot of slices through fitted generalized linear regression surface 
predict  Predict response of generalized linear regression model 
random  Simulate responses for generalized linear regression model 
removeTerms  Remove terms from generalized linear model 
step  Improve generalized linear regression model by adding or removing terms 
stepwise  Create generalized linear regression model by stepwise regression 
The default link function for a generalized linear model is the canonical link function.
Canonical Link Functions for Generalized Linear Models
Distribution  Link Function Name  Link Function  Mean (Inverse) Function 

'normal'  'identity'  f(μ) = μ  μ = Xb 
'binomial'  'logit'  f(μ) = log(μ/(1–μ))  μ = exp(Xb) / (1 + exp(Xb)) 
'poisson'  'log'  f(μ) = log(μ)  μ = exp(Xb) 
'gamma'  1  f(μ) = 1/μ  μ = 1/(Xb) 
'inverse gaussian'  2  f(μ) = 1/μ^{2}  μ = (Xb)^{–1/2} 
The hat matrix H is defined in terms of the data matrix X and a diagonal weight matrix W:
H = X(X^{T}WX)^{–1}X^{T}W^{T}.
W has diagonal elements w_{i}:
$${w}_{i}=\frac{{g}^{\prime}\left({\mu}_{i}\right)}{\sqrt{V\left({\mu}_{i}\right)}},$$
where
g is the link function mapping y_{i} to x_{i}b.
$${g}^{\prime}$$ is the derivative of the link function g.
V is the variance function.
μ_{i} is the ith mean.
The diagonal elements H_{ii} satisfy
$$\begin{array}{l}0\le {h}_{ii}\le 1\\ {\displaystyle \sum _{i=1}^{n}{h}_{ii}}=p,\end{array}$$
where n is the number of observations (rows of X), and p is the number of coefficients in the regression model.
The leverage of observation i is the value of the ith diagonal term, h_{ii}, of the hat matrix H. Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered to be an outlier if its leverage substantially exceeds p/n, where n is the number of observations.
The Cook's distance D_{i} of observation i is
$${D}_{i}={w}_{i}\frac{{e}_{i}^{2}}{p\widehat{\phi}}\frac{{h}_{ii}}{{\left(1{h}_{ii}\right)}^{2}},$$
where
$$\widehat{\phi}$$ is the dispersion parameter (estimated or theoretical).
e_{i} is the linear predictor residual, $$g\left({y}_{i}\right){x}_{i}\widehat{\beta}$$, where
g is the link function.
y_{i} is the observed response.
x_{i} is the observation.
$$\widehat{\beta}$$ is the estimated coefficient vector.
p is the number of coefficients in the regression model.
h_{ii} is the ith diagonal element of the Hat Matrix H.
Deviance of a model M_{1} is twice the difference between the loglikelihood of that model and the saturated model, M_{S}. The saturated model is the model with the maximum number of parameters that can be estimated. For example, if there are n observations y_{i}, i = 1, 2, ..., n, with potentially different values for X_{i}^{T}β, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model. Then the deviance of model M_{1} is
$$2\left(\mathrm{log}L\left({b}_{1},y\right)\mathrm{log}L\left({b}_{S},y\right)\right),$$
where b_{1} are the estimated parameters for model M_{1} and b_{S} are the estimated parameters for the saturated model. The deviance has a chisquare distribution with n – p degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in model M_{1}.
If M_{1} and M_{2} are two different generalized linear models, then the fit of the models can be assessed by comparing the deviances D_{1} and D_{2} of these models. The difference of the deviances is
$$\begin{array}{l}D={D}_{2}{D}_{1}=2\left(\mathrm{log}L\left({b}_{2},y\right)\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=2\left(\mathrm{log}L\left({b}_{2},y\right)\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$$
Asymptotically, this difference has a chisquare distribution
with degrees of freedom v equal to the number of
parameters that are estimated in one model but fixed (typically at
0) in the other. That is, it is equal to the difference in the number
of parameters estimated in M_{1} and M_{2}.
You can get the pvalue for this test using 1  chi2cdf(D,V)
, where D = D_{2} – D_{1}.
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB^{®} documentation.