(Not Recommended) Create generalized linear regression model
GeneralizedLinearModel.fit is not recommended. Use
mdl = GeneralizedLinearModel.fit(tbl)
mdl = GeneralizedLinearModel.fit(X,y)
mdl = GeneralizedLinearModel.fit(...,modelspec)
mdl = GeneralizedLinearModel.fit(...,Name,Value)
mdl = GeneralizedLinearModel.fit(...,modelspec,Name,Value)
mdl = GeneralizedLinearModel.fit(...,
creates a generalized linear model with additional options specified by one or more
Name,Value pair arguments.
modelspec— Model specification
'linear'(default) | character vector or string scalar naming the model | t-by-(p + 1) terms matrix | character vector or string scalar formula in the form
'Y ~ terms'
Model specification, specified as one of the following:
Character vector or string scalar specifying the type of model.
|Model contains only a constant (intercept) term.|
|Model contains an intercept and linear term for each predictor.|
|Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms).|
|Model contains an intercept term and linear and squared terms for each predictor.|
|Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors.|
|Model is a polynomial with all terms up to degree |
t-by-(p+1) matrix, namely terms matrix, specifying terms to include in model, where t is the number of terms and p is the number of predictor variables, and plus one is for the response variable.
Character vector or string scalar representing a formula in the form
termsare in Wilkinson Notation.
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
Fit a logistic regression model of probability of smoking as a function of age, weight, and sex, using a two-way interactions model.
hospital dataset array.
load hospital ds = hospital; % just to use the ds name
Specify the model using a formula that allows up to two-way interactions.
modelspec = 'Smoker ~ Age*Weight*Sex - Age:Weight:Sex';
Create the generalized linear model.
mdl = fitglm(ds,modelspec,'Distribution','binomial')
mdl = Generalized linear regression model: logit(Smoker) ~ 1 + Sex*Age + Sex*Weight + Age*Weight Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ___________ _________ ________ _______ (Intercept) -6.0492 19.749 -0.3063 0.75938 Sex_Male -2.2859 12.424 -0.18399 0.85402 Age 0.11691 0.50977 0.22934 0.81861 Weight 0.031109 0.15208 0.20455 0.83792 Sex_Male:Age 0.020734 0.20681 0.10025 0.92014 Sex_Male:Weight 0.01216 0.053168 0.22871 0.8191 Age:Weight -0.00071959 0.0038964 -0.18468 0.85348 100 observations, 93 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 5.07, p-value = 0.535
The large -value indicates the model might not differ statistically from a constant.
A terms matrix
T is a
t-by-(p + 1) matrix specifying terms in a model,
where t is the number of terms, p is the number of
predictor variables, and +1 accounts for the response variable. The value of
T(i,j) is the exponent of variable
j in term
For example, suppose that an input includes three predictor variables
C and the response variable
Y in the order
Y. Each row of
represents one term:
[0 0 0 0] — Constant term or intercept
[0 1 0 0] —
A^0 * B^1 * C^0
[1 0 1 0] —
[2 0 0 0] —
[0 1 2 0] —
0 at the end of each term represents the response variable. In
general, a column vector of zeros in a terms matrix represents the position of the response
variable. If you have the predictor and response variables in a matrix and column vector,
then you must include
0 for the response variable in the last column of
A formula for model specification is a character vector or string scalar of
Y is the response name.
terms represents the predictor terms in a model using
'Y ~ A + B + C' specifies a three-variable
linear model with intercept.
'Y ~ A + B + C – 1' specifies a
three-variable linear model without intercept. Note that
formulas include a constant (intercept) term by default. To
exclude a constant term from the model, you must include
–1 in the formula.
Wilkinson notation describes the terms present in a model. The notation relates to the terms present in a model, not to the multipliers (coefficients) of those terms.
Wilkinson notation uses these symbols:
+ means include the next variable.
– means do not include the next variable.
: defines an interaction, which is a product of
* defines an interaction and all lower-order terms.
^ raises the predictor to a power, exactly as in
* repeated, so
^ includes lower-order
terms as well.
() groups terms.
This table shows typical examples of Wilkinson notation.
|Wilkinson Notation||Term in Standard Notation|
|Constant (intercept) term|
|Do not include |
Statistics and Machine
Learning Toolbox™ notation always includes a constant term unless you explicitly remove the term
For more details, see Wilkinson Notation.
The generalized linear model
mdl is a standard linear model unless you
specify otherwise with the
Distribution name-value pair.
For other methods such as
devianceTest, or properties of the
GeneralizedLinearModel object, see
You can also construct a generalized linear model using
 Collett, D. Modeling Binary Data. New York: Chapman & Hall, 2002.
 Dobson, A. J. An Introduction to Generalized Linear Models. New York: Chapman & Hall, 1990.
 McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.