Generalized linear mixed-effects model class
A GeneralizedLinearMixedModel object represents a regression model
of a response variable that contains both fixed and random effects. The object comprises
data, a model description, fitted coefficients, covariance parameters, design matrices,
residuals, residual plots, and other diagnostic information for a generalized linear
mixed-effects (GLME) model. You can predict model responses with the
predict function and generate random data at new design points
using the random function.
You can fit a generalized linear mixed-effects (GLME) model to sample data using
fitglme(. For
more information, see tbl,formula)fitglme.
tbl — Input dataInput data, which includes the response variable, predictor variables,
and grouping variables, specified as a table or dataset array. The
predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify
the model for the variables using formula.
Data Types: table
formula — Formula for model specification'y ~ fixed +
(random1|grouping1) + ... + (randomR|groupingR)'Formula for model specification, specified as a character vector or
string scalar of the form 'y ~ fixed + (random1|grouping1) +
... + (randomR|groupingR)'. For a full description, see
Formula.
Example: 'y ~ treatment +(1|block)'
Coefficients — Estimates of fixed-effects coefficientsEstimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns:
Name — Name of the coefficient
Estimate — Estimated coefficient
value
SE — Standard error of the
estimate
tStat — t-statistic
for a test that the coefficient is equal to 0
DF — Degrees of freedom associated with
the t statistic
pValue — p-value for
the t-statistic
Lower — Lower confidence limit
Upper — Upper confidence limit
To obtain any of these columns as a vector, index into the property using dot notation.
Use the coefTest method to perform other
tests on the coefficients.
CoefficientCovariance — Covariance of estimated fixed-effects vectorCovariance of estimated fixed-effects vector, stored as a matrix.
Data Types: single | double
CoefficientNames — Names of fixed-effects coefficientsNames of fixed-effects coefficients, stored as a cell array of character
vectors. The label for the coefficient of the constant term is
(Intercept). The labels for other coefficients
indicate the terms that they multiply. When the term includes a categorical
predictor, the label also indicates the level of that predictor.
Data Types: cell
DFE — Degrees of freedom for errorDegrees of freedom for error, stored as a positive integer value.
DFE is the number of observations minus the number of
estimated coefficients.
DFE contains the degrees of freedom corresponding to
the 'Residual' method of calculating denominator degrees
of freedom for hypothesis tests on fixed-effects coefficients. If
n is the number of observations and
p is the number of fixed-effects coefficients, then
DFE is equal to n – p.
Data Types: double
Dispersion — Model dispersion parameterModel dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response.
For observation i, the conditional variance of the response yi, given the conditional mean μi and the dispersion parameter σ2, in a generalized linear mixed-effects model is
where wi is the ith observation weight and
v is the variance function for the specified
conditional distribution of the response. The Dispersion
property contains an estimate of σ2 for the specified GLME model. The value of
Dispersion depends on the specified conditional
distribution of the response. For binomial and Poisson distributions, the
theoretical value of Dispersion is equal to σ2 =
1.0.
If FitMethod is MPL or
REMPL and the
'DispersionFlag' name-value pair argument in
fitglme is
true, then a dispersion parameter is
estimated from data for all distributions, including binomial and
Poisson distributions.
If FitMethod is
ApproximateLaplace or
Laplace, then the
'DispersionFlag' name-value pair argument in
fitglme does not apply,
and the dispersion parameter is fixed at 1.0 for binomial and
Poisson distributions. For all other distributions,
Dispersion is estimated from data.
Data Types: double
DispersionEstimated — Flag indicating if dispersion parameter was estimatedtrue | falseFlag indicating estimated dispersion parameter, stored as a logical value.
If FitMethod is
ApproximateLaplace or
Laplace, then the dispersion parameter is
fixed at its theoretical value of 1.0 for binomial and Poisson
distributions, and DispersionEstimated is
false. For other distributions, the
dispersion parameter is estimated from the data, and
DispersionEstimated is
true.
If FitMethod is MPL or
REMPL, and the
'DispersionFlag' name-value pair argument in
fitglme is specified as
true, then the dispersion parameter is
estimated for all distributions, including binomial and Poisson
distributions, and DispersionEstimated is
true.
If FitMethod is MPL or
REMPL, and the
'DispersionFlag' name-value pair argument in
fitglme is specified as
false, then the dispersion parameter is fixed
at its theoretical value for binomial and Poisson distributions, and
DispersionEstimated is
false. For distributions other than binomial
and Poisson, the dispersion parameter is estimated from the data,
and DispersionEstimated is
true.
Data Types: logical
Distribution — Response distribution name'Normal' | 'Binomial' | 'Poisson' | 'Gamma' | 'InverseGaussian'Response distribution name, stored as one of the following:
'Normal' — Normal distribution
'Binomial' — Binomial
distribution
'Poisson' — Poisson distribution
'Gamma' — Gamma distribution
'InverseGaussian' — Inverse Gaussian
distribution
FitMethod — Method used to fit the model'MPL' | 'REMPL' | 'ApproximateLaplace' | 'Laplace'Method used to fit the model, stored as one of the following.
'MPL' — Maximum pseudo likelihood
'REMPL' — Restricted maximum pseudo
likelihood
'ApproximateLaplace' — Maximum
likelihood using the approximate Laplace method, with fixed effects
profiled out
'Laplace' — Maximum likelihood using the
Laplace method
Formula — Model specification formulaModel specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula.
Link — Link function characteristicsLink function characteristics, stored as a structure containing the
following fields. The link is a function G that links the
distribution parameter MU to the linear predictor
ETA as follows: G(MU) =
ETA.
| Field | Description |
|---|---|
Name | Name of the link function |
Link | Function that defines G |
Derivative | Derivative of G |
SecondDerivative | Second derivative of G |
Inverse | Inverse of G |
Data Types: struct
LogLikelihood — Log of likelihood functionLog of likelihood function evaluated at the estimated coefficient values,
stored as a scalar value. LogLikelihood depends on the
method used to fit the model.
If you use 'Laplace' or
'ApproximateLaplace', then
LogLikelihood is the maximized log
likelihood.
If you use 'MPL', then
LogLikelihood is the maximized log likelihood
of the pseudo data from the final pseudo likelihood
iteration.
If you use 'REMPL', then
LogLikelihood is the maximized restricted log
likelihood of the pseudo data from the final pseudo likelihood
iteration.
Data Types: double
ModelCriterion — Model criterionModel criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields.
| Field | Description |
|---|---|
AIC | Akaike information criterion |
BIC | Bayesian information criterion |
LogLikelihood |
|
Deviance | –2 times LogLikelihood |
NumCoefficients — Number of fixed-effects coefficientsNumber of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumEstimatedCoefficients — Number of estimated fixed-effects coefficientsNumber of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumObservations — Number of observationsNumber of observations used in the fit, stored as a positive integer
value. NumObservations is the number of rows in the table
or dataset array tbl, minus rows excluded using the
'Exclude' name-value pair of fitglme or rows containing
NaN values.
Data Types: double
NumPredictors — Number of predictorsNumber of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value.
Data Types: double
NumVariables — Total number of variablesTotal number of variables, including the response and predictors, stored
as a positive integer value. If the sample data is in a table or dataset
array tbl, then NumVariables is the
total number of variables in tbl, including the response
variable. NumVariables includes variables, if any, that
are not used as predictors or as the response.
Data Types: double
ObservationInfo — Information about the observationsInformation about the observations used in the fit, stored as a table.
ObservationInfo has one row for each observation and
the following columns.
| Name | Description |
|---|---|
Weights | The weight value for the observation. The default value is 1. |
Excluded | If the observation was excluded from the fit using
the 'Exclude' name-value pair
argument in fitglme, then
Excluded is
true, or 1.
Otherwise, Excluded is
false, or
0. |
Missing | If the observation was excluded from the fit
because any response or predictor value is missing,
then Missing
values include |
Subset | If the observation was used in the fit, then
Subset is
true. If the observation was not used
in the fit because it is missing or excluded, then
Subset is
false. |
BinomSize | Binomial size for each observation. This column only applies when fitting a binomial distribution. |
Data Types: table
ObservationNames — Names of observationsNames of observations used in the fit, stored as a cell array of character vectors.
If the data is in a table or dataset array tbl
that contains observation names, then
ObservationNames uses those names.
If the data is provided in matrices, or in a table or dataset
array without observation names, then
ObservationNames is an empty cell
array.
Data Types: cell
PredictorNames — Names of predictorsNames of the variables used as predictors in the fit, stored as a cell
array of character vectors that has the same length as
NumPredictors.
Data Types: cell
ResponseName — Name of response variableName of the variable used as the response variable in the fit, stored as a character vector.
Data Types: char
Rsquared — Proportion of variability in the response explained by the fitted modelProportion of variability in the response explained by the fitted model,
stored as a structure. Rsquared contains the
R-squared value of the fitted model, also known as
the multiple correlation coefficient. Rsquared contains
the following fields.
| Field | Description |
|---|---|
Ordinary | R-squared value, stored as a scalar value in a
structure.Rsquared.Ordinary =
1 — SSE./SST |
Adjusted | R-squared value adjusted for the number of fixed-effects
coefficients, stored as a scalar value in a
structure.Rsquared.Adjusted =
1 —
(SSE./SST)*(DFT./DFE),where DFE = n – p, DFT = n –
1, n is the total number of
observations, and p is the number of
fixed-effects coefficients. |
Data Types: struct
SSE — Error sum of squaresError sum of squares, stored as a positive scalar value.
SSE is the weighted sum of the squared conditional
residuals, and is calculated as
where n is the number of observations, wieff is the ith effective weight, yi is the ith response, and fi is the ith fitted value.
The ith effective weight is calculated as
where vi is the variance term for the ith observation, and are estimated values of β and b, respectively.
The ith fitted value is calculated as
where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value.
Data Types: double
SSR — Regression sum of squaresRegression sum of squares, stored as a positive scalar value.
SSR is the sum of squares explained by the
generalized linear mixed-effects regression, or equivalently the weighted
sum of the squared deviations of the conditional fitted values from their
weighted mean. SSR is calculated as
where n is the number of observations, wieff is the ith effective weight, fi is the ith fitted value, and is a weighted average of the fitted values.
The ith effective weight is calculated as
where and are estimated values of β and b, respectively.
The ith fitted value is calculated as
where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value.
The weighted average of fitted values is calculated as
Data Types: double
SST — Total sum of squaresTotal sum of squares, stored as a positive scalar value. For a GLME model,
SST is defined as SST = SSE +
SSR.
Data Types: double
VariableInfo — Information about the variablesInformation about the variables used in the fit, stored as a table.
VariableInfo has one row for each variable and
contains the following columns.
| Column Name | Description |
|---|---|
Class | Class of the variable ('double',
'cell', 'nominal',
and so on). |
Range | Value range of the variable.
|
InModel | If the variable is a predictor in the fitted model,
If the variable
is not in the fitted model, |
IsCategorical | If the variable type is treated as a categorical
predictor (such as cell, logical, or categorical), then
If the variable
is a continuous predictor, then
|
Data Types: table
VariableNames — Names of the variablesNames of all the variables contained in the table or dataset array
tbl, stored as a cell array of character
vectors.
Data Types: cell
Variables — VariablesVariables, stored as a table. If the fit is based on a table or dataset
array tbl, then Variables is identical
to tbl.
Data Types: table
anova | Analysis of variance for generalized linear mixed-effects model |
coefCI | Confidence intervals for coefficients of generalized linear mixed-effects model |
coefTest | Hypothesis test on fixed and random effects of generalized linear mixed-effects model |
compare | Compare generalized linear mixed-effects models |
covarianceParameters | Extract covariance parameters of generalized linear mixed-effects model |
designMatrix | Fixed- and random-effects design matrices |
fitted | Fitted responses from generalized linear mixed-effects model |
fixedEffects | Estimates of fixed effects and related statistics |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
plotResiduals | Plot residuals of generalized linear mixed-effects model |
predict | Predict response of generalized linear mixed-effects model |
random | Generate random responses from fitted generalized linear mixed-effects model |
randomEffects | Estimates of random effects and related statistics |
refit | Refit generalized linear mixed-effects model |
residuals | Residuals of fitted generalized linear mixed-effects model |
response | Response vector of generalized linear mixed-effects model |
Load the sample data.
load mfrThis simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:
Flag to indicate whether the batch used the new process (newprocess)
Processing time for each batch, in hours (time)
Temperature of the batch, in degrees Celsius (temp)
Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier)
Number of defects in the batch (defects)
The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.
Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0.
The number of defects can be modeled using a Poisson distribution
This corresponds to the generalized linear mixed-effects model
where
is the number of defects observed in the batch produced by factory during batch .
is the mean number of defects corresponding to factory (where ) during batch (where ).
, , and are the measurements for each variable that correspond to factory during batch . For example, indicates whether the batch produced by factory during batch used the new process.
and are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory during batch .
is a random-effects intercept for each factory that accounts for factory-specific variation in quality.
glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');
Display the model.
disp(glme)
Generalized linear mixed-effects model fit by ML
Model information:
Number of observations 100
Fixed effects coefficients 6
Random effects coefficients 20
Covariance parameters 1
Distribution Poisson
Link Log
FitMethod Laplace
Formula:
defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)
Model fit statistics:
AIC BIC LogLikelihood Deviance
416.35 434.58 -201.17 402.35
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue
{'(Intercept)'} 1.4689 0.15988 9.1875 94 9.8194e-15
{'newprocess' } -0.36766 0.17755 -2.0708 94 0.041122
{'time_dev' } -0.094521 0.82849 -0.11409 94 0.90941
{'temp_dev' } -0.28317 0.9617 -0.29444 94 0.76907
{'supplier_C' } -0.071868 0.078024 -0.9211 94 0.35936
{'supplier_B' } 0.071072 0.07739 0.91836 94 0.36078
Lower Upper
1.1515 1.7864
-0.72019 -0.015134
-1.7395 1.5505
-2.1926 1.6263
-0.22679 0.083051
-0.082588 0.22473
Random effects covariance parameters:
Group: factory (20 Levels)
Name1 Name2 Type Estimate
{'(Intercept)'} {'(Intercept)'} {'std'} 0.31381
Group: Error
Name Estimate
{'sqrt(Dispersion)'} 1
The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace.
Formula indicates the model specification using Wilkinson’s notation.
The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values.
The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the -statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and -value that correspond to the -statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.
Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.
The standard display generated by fitglme does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters.
In general, a formula for model specification is a character
vector or string scalar of the form 'y ~ terms'. For generalized
linear mixed-effects models, this formula is in the form 'y ~ fixed +
(random1|grouping1) + ... + (randomR|groupingR)', where
fixed and random contain the fixed-effects
and the random-effects terms, respectively, and R is the number
of grouping variables in the model.
Suppose a table tbl contains the following:
A response variable, y
Predictor variables,
Xj, which
can be continuous or grouping variables
Grouping variables, g1,
g2, ...,
gR,
where the grouping variables in
Xj and
gr can be
categorical, logical, character arrays, string arrays, or cell arrays of character
vectors.
Then, in a formula of the form, 'y ~ fixed +
(random1|g1) + ... +
(randomR|gR)',
the term fixed corresponds to a specification of the
fixed-effects design matrix X,
random1 is a specification of the
random-effects design matrix Z1
corresponding to grouping variable g1, and
similarly randomR is a
specification of the random-effects design matrix
ZR
corresponding to grouping variable
gR. You can
express the fixed and random terms using
Wilkinson notation.
Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.
| Wilkinson Notation | Factors in Standard Notation |
|---|---|
1 | Constant (intercept) term |
X^k, where k is a positive
integer | X,
X2, ...,
Xk |
X1 + X2 | X1, X2 |
X1*X2 | X1, X2, X1.*X2
(elementwise multiplication of X1 and X2) |
X1:X2 | X1.*X2 only |
- X2 | Do not include X2 |
X1*X2 + X3 | X1, X2,
X3, X1*X2 |
X1 + X2 + X3 + X1:X2 | X1, X2,
X3, X1*X2 |
X1*X2*X3 - X1:X2:X3 | X1, X2,
X3, X1*X2,
X1*X3, X2*X3 |
X1*(X2 + X3) | X1, X2,
X3, X1*X2,
X1*X3 |
Statistics and Machine Learning Toolbox™ notation always includes a constant term unless you explicitly remove
the term using -1. Here are some examples for linear
mixed-effects model specification.
Examples:
| Formula | Description |
|---|---|
'y ~ X1 + X2' | Fixed effects for the intercept, X1 and
X2. This is equivalent to 'y ~ 1 +
X1 + X2'. |
'y ~ -1 + X1 + X2' | No intercept and fixed effects for X1 and
X2. The implicit intercept term is suppressed
by including -1. |
'y ~ 1 + (1 | g1)' | Fixed effects for the intercept plus random effect for the
intercept for each level of the grouping variable
g1. |
'y ~ X1 + (1 | g1)' | Random intercept model with a fixed slope. |
'y ~ X1 + (X1 | g1)' | Random intercept and slope, with possible correlation between
them. This is equivalent to 'y ~ 1 + X1 + (1 +
X1|g1)'. |
'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' | Independent random effects terms for intercept and slope. |
'y ~ 1 + (1 | g1) + (1 | g2) + (1 |
g1:g2)' | Random intercept model with independent main effects for
g1 and g2, plus an
independent interaction effect. |
You have a modified version of this example. Do you want to open this example with your edits?