Documentation |
Generalized linear regression model class
An object comprising training data, model description, diagnostic information, and fitted coefficients for a generalized linear regression. Predict model responses with the predict or feval methods.
mdl = fitglm(tbl) or mdl = fitglm(X,y) creates a generalized linear model of a table or dataset array tbl, or of the responses y to a data matrix X. For details, see fitglm.
mdl = stepwiseglm(tbl) or mdl = stepwiseglm(X,y) creates a generalized linear model of a table or dataset array tbl, or of the responses y to a data matrix X, with unimportant predictors excluded. For details, see stepwiseglm.
CoefficientCovariance |
Covariance matrix of coefficient estimates. | ||||||||||||||||||||||||||||||||||
CoefficientNames |
Cell array of strings containing a label for each coefficient. | ||||||||||||||||||||||||||||||||||
Coefficients |
Coefficient values stored as a table. Coefficients has one row for each coefficient and these columns:
To obtain any of these columns as a vector, index into the property using dot notation. For example, in mdl the estimated coefficient vector is beta = mdl.Coefficients.Estimate Use coefTest to perform other tests on the coefficients. | ||||||||||||||||||||||||||||||||||
Deviance |
Deviance of the fit. It is useful for comparing two models when one is a special case of the other. The difference between the deviance of the two models has a chi-square distribution with degrees of freedom equal to the difference in the number of estimated parameters between the two models. For more information on deviance, see Deviance. | ||||||||||||||||||||||||||||||||||
DFE |
Degrees of freedom for error (residuals), equal to the number of observations minus the number of estimated coefficients. | ||||||||||||||||||||||||||||||||||
Diagnostics |
Table with diagnostics helpful in finding outliers and influential observations. The table contains the following fields:
All of these quantities are computed on the scale of the linear predictor. So, for example, in the equation that defines the hat matrix, Yfit = glm.Fitted.LinearPredictor Y = glm.Fitted.LinearPredictor + glm.Residuals.LinearPredictor | ||||||||||||||||||||||||||||||||||
Dispersion |
Scale factor of the variance of the response. Dispersion multiplies the variance function for the distribution. For example, the variance function for the binomial distribution is p(1–p)/n, where p is the probability parameter and n is the sample size parameter. If Dispersion is near 1, the variance of the data appears to agree with the theoretical variance of the binomial distribution. If Dispersion is larger than 1, the data are "overdispersed" relative to the binomial distribution. | ||||||||||||||||||||||||||||||||||
DispersionEstimated |
Logical value indicating whether fitglm used the Dispersion property to compute standard errors for the coefficients in Coefficients.SE. If DispersionEstimated is false, fitglm used the theoretical value of the variance.
| ||||||||||||||||||||||||||||||||||
Distribution |
Structure with the following fields relating to the generalized distribution:
| ||||||||||||||||||||||||||||||||||
Fitted |
Table of predicted (fitted) values based on the training data, a table with one row for each observation and the following columns.
To obtain any of the columns as a vector, index into the property using dot notation. For example, in the model mdl, the vector f of fitted values on the response scale is f = mdl.Fitted.Response Use predict to compute predictions for other predictor values, or to compute confidence bounds on Fitted. | ||||||||||||||||||||||||||||||||||
Formula |
Object containing information about the model. | ||||||||||||||||||||||||||||||||||
Link |
Structure with fields relating to the link function. The link is a function f that links the distribution parameter μ to the fitted linear combination Xb of the predictors: f(μ) = Xb. The structure has the following fields.
| ||||||||||||||||||||||||||||||||||
LogLikelihood |
Log likelihood of the model distribution at the response values, with mean fitted from the model, and other parameters estimated as part of the model fit. | ||||||||||||||||||||||||||||||||||
ModelCriterion |
AIC and other information criteria for comparing models. A structure with fields:
To obtain any of these values as a scalar, index into the property using dot notation. For example, in a model mdl, the AIC value aic is: aic = mdl.ModelCriterion.AIC | ||||||||||||||||||||||||||||||||||
NumCoefficients |
Number of coefficients in the model, a positive integer. NumCoefficients includes coefficients that are set to zero when the model terms are rank deficient. | ||||||||||||||||||||||||||||||||||
NumEstimatedCoefficients |
Number of estimated coefficients in the model, a positive integer. NumEstimatedCoefficients does not include coefficients that are set to zero when the model terms are rank deficient. NumEstimatedCoefficients is the degrees of freedom for regression. | ||||||||||||||||||||||||||||||||||
NumObservations |
Number of observations the fitting function used in fitting. This is the number of observations supplied in the original table, dataset, or matrix, minus any excluded rows (set with the Excluded name-value pair) or rows with missing values. | ||||||||||||||||||||||||||||||||||
NumPredictors |
Number of variables fitlm used as predictors for fitting. | ||||||||||||||||||||||||||||||||||
NumVariables |
Number of variables in the data. NumVariables is the number of variables in the original table or dataset, or the total number of columns in the predictor matrix and response vector when the fit is based on those arrays. It includes variables, if any, that are not used as predictors or as the response. | ||||||||||||||||||||||||||||||||||
ObservationInfo |
Table with the same number of rows as the input data (tbl or X).
| ||||||||||||||||||||||||||||||||||
ObservationNames |
Cell array of strings containing the names of the observations used in the fit.
| ||||||||||||||||||||||||||||||||||
Offset |
Vector with the same length as the number of rows in the data, passed from fitglm or stepwiseglm in the Offset name-value pair. The fitting function used Offset as a predictor variable, but with the coefficient set to exactly 1. In other words, the formula for fitting was μ ~ Offset + (terms involving real predictors) with the Offset predictor having coefficient 1. For example, consider a Poisson regression model. Suppose the number of counts is known for theoretical reasons to be proportional to a predictor A. By using the log link function and by specifying log(A) as an offset, you can force the model to satisfy this theoretical constraint. | ||||||||||||||||||||||||||||||||||
PredictorNames |
Cell array of strings, the names of the predictors used in fitting the model. | ||||||||||||||||||||||||||||||||||
Residuals |
Table containing residuals, with one row for each observation and these variables.
To obtain any of these columns as a vector, index into the property using dot notation. For example, in a model mdl, the ordinary raw residual vector r is: r = mdl.Residuals.Raw Rows not used in the fit because of missing values (in ObservationInfo.Missing) contain NaN values. Rows not used in the fit because of excluded values (in ObservationInfo.Excluded) contain NaN values, with the following exceptions:
| ||||||||||||||||||||||||||||||||||
ResponseName |
String giving naming the response variable. | ||||||||||||||||||||||||||||||||||
Rsquared |
Proportion of total sum of squares explained by the model. The ordinary R-squared value relates to the SSR and SST properties: Rsquared = SSR/SST = 1 - SSE/SST. For a linear or nonlinear model, Rsquared is a structure with two fields:
For a generalized linear model, Rsquared is a structure with five fields:
To obtain any of these values as a scalar, index into the property using dot notation. For example, the adjusted R-squared value in mdl is r2 = mdl.Rsquared.Adjusted | ||||||||||||||||||||||||||||||||||
SSE |
Sum of squared errors (residuals). The Pythagorean theorem implies SST = SSE + SSR. | ||||||||||||||||||||||||||||||||||
SSR |
Regression sum of squares, the sum of squared deviations of the fitted values from their mean. The Pythagorean theorem implies SST = SSE + SSR. | ||||||||||||||||||||||||||||||||||
SST |
Total sum of squares, the sum of squared deviations of y from mean(y). The Pythagorean theorem implies SST = SSE + SSR. | ||||||||||||||||||||||||||||||||||
Steps |
Structure that is empty unless stepwiselm constructed the model.
The History table has one row for each step including the initial fit, and the following variables (columns).
| ||||||||||||||||||||||||||||||||||
VariableInfo |
Table containing metadata about Variables. There is one row for each term in the model, and the following columns.
| ||||||||||||||||||||||||||||||||||
VariableNames |
Cell array of strings containing names of the variables in the fit.
| ||||||||||||||||||||||||||||||||||
Variables |
Table containing the data, both observations and responses, that the fitting function used to construct the fit. If the fit is based on a table or dataset array, Variables contains all of the data from that table or dataset array. Otherwise, Variables is a table created from the input data matrix X and response vector y. |
addTerms | Add terms to generalized linear model |
coefCI | Confidence intervals of coefficient estimates of generalized linear model |
coefTest | Linear hypothesis test on generalized linear regression model coefficients |
devianceTest | Analysis of deviance |
disp | Display generalized linear regression model |
feval | Evaluate generalized linear regression model prediction |
fit | Create generalized linear regression model |
plotDiagnostics | Plot diagnostics of generalized linear regression model |
plotResiduals | Plot residuals of generalized linear regression model |
plotSlice | Plot of slices through fitted generalized linear regression surface |
predict | Predict response of generalized linear regression model |
random | Simulate responses for generalized linear regression model |
removeTerms | Remove terms from generalized linear model |
step | Improve generalized linear regression model by adding or removing terms |
stepwise | Create generalized linear regression model by stepwise regression |
The default link function for a generalized linear model is the canonical link function.
Canonical Link Functions for Generalized Linear Models
Distribution | Link Function Name | Link Function | Mean (Inverse) Function |
---|---|---|---|
'normal' | 'identity' | f(μ) = μ | μ = Xb |
'binomial' | 'logit' | f(μ) = log(μ/(1–μ)) | μ = exp(Xb) / (1 + exp(Xb)) |
'poisson' | 'log' | f(μ) = log(μ) | μ = exp(Xb) |
'gamma' | -1 | f(μ) = 1/μ | μ = 1/(Xb) |
'inverse gaussian' | -2 | f(μ) = 1/μ^{2} | μ = (Xb)^{–1/2} |
The hat matrix H is defined in terms of the data matrix X and a diagonal weight matrix W:
H = X(X^{T}WX)^{–1}X^{T}W^{T}.
W has diagonal elements w_{i}:
$${w}_{i}=\frac{{g}^{\prime}\left({\mu}_{i}\right)}{\sqrt{V\left({\mu}_{i}\right)}},$$
where
g is the link function mapping y_{i} to x_{i}b.
$${g}^{\prime}$$ is the derivative of the link function g.
V is the variance function.
μ_{i} is the ith mean.
The diagonal elements H_{ii} satisfy
$$\begin{array}{l}0\le {h}_{ii}\le 1\\ {\displaystyle \sum _{i=1}^{n}{h}_{ii}}=p,\end{array}$$
where n is the number of observations (rows of X), and p is the number of coefficients in the regression model.
The leverage of observation i is the value of the ith diagonal term, h_{ii}, of the hat matrix H. Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered to be an outlier if its leverage substantially exceeds p/n, where n is the number of observations.
The Cook's distance D_{i} of observation i is
$${D}_{i}={w}_{i}\frac{{e}_{i}^{2}}{p\widehat{\phi}}\frac{{h}_{ii}}{{\left(1-{h}_{ii}\right)}^{2}},$$
where
$$\widehat{\phi}$$ is the dispersion parameter (estimated or theoretical).
e_{i} is the linear predictor residual, $$g\left({y}_{i}\right)-{x}_{i}\widehat{\beta}$$, where
g is the link function.
y_{i} is the observed response.
x_{i} is the observation.
$$\widehat{\beta}$$ is the estimated coefficient vector.
p is the number of coefficients in the regression model.
h_{ii} is the ith diagonal element of the Hat Matrix H.
Deviance of a model M_{1} is twice the difference between the loglikelihood of that model and the saturated model, M_{S}. The saturated model is the model with the maximum number of parameters that can be estimated. For example, if there are n observations y_{i}, i = 1, 2, ..., n, with potentially different values for X_{i}^{T}β, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model. Then the deviance of model M_{1} is
$$-2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right),$$
where b_{1} are the estimated parameters for model M_{1} and b_{S} are the estimated parameters for the saturated model. The deviance has a chi-square distribution with n – p degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in model M_{1}.
If M_{1} and M_{2} are two different generalized linear models, then the fit of the models can be assessed by comparing the deviances D_{1} and D_{2} of these models. The difference of the deviances is
$$\begin{array}{l}D={D}_{2}-{D}_{1}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$$
Asymptotically, this difference has a chi-square distribution with degrees of freedom v equal to the number of parameters that are estimated in one model but fixed (typically at 0) in the other. That is, it is equal to the difference in the number of parameters estimated in M_{1} and M_{2}. You can get the p-value for this test using 1 - chi2cdf(D,V), where D = D_{2} – D_{1}.
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB^{®} documentation.