Generalized Linear Mixed-Effects Models
What Are Generalized Linear Mixed-Effects Models?
Generalized linear mixed-effects (GLME) models describe the relationship between a response variable and independent variables using coefficients that can vary with respect to one or more grouping variables, for data with a response variable distribution other than normal. You can think of GLME models as extensions of generalized linear models (GLM) for data that are collected and summarized in groups. Alternatively, you can think of GLME models as a generalization of linear mixed-effects models (LME) for data where the response variable is not normally distributed.
A mixed-effects model consists of fixed-effects and random-effects terms. Fixed-effects terms are usually the conventional linear regression part of the model. Random-effects terms are associated with individual experimental units drawn at random from a population, and account for variations between groups that might affect the response. The random effects have prior distributions, whereas the fixed effects do not.
GLME Model Equations
The standard form of a generalized linear mixed-effects model is
y is an n-by-1 response vector, and yi is its ith element.
b is the random-effects vector.
Distr is a specified conditional distribution of y given b.
μ is the conditional mean of y given b, and μi is its ith element.
σ2 is the dispersion parameter.
w is the effective observation weight vector, and wi is the weight for observation i.
For a binomial distribution, the effective observation weight is equal to the prior weight specified using the
'Weights'name-value pair argument in
fitglme, multiplied by the binomial size specified using the
'BinomialSize'name-value pair argument.
For all other distributions, the effective observation weight is equal to the prior weight specified using the
'Weights'name-value pair argument in
g(μ) is a link function that defines the relationship between the mean response μ and the linear combination of the predictors.
X is an n-by-p fixed-effects design matrix.
β is a p-by-1 fixed-effects vector.
Z is an n-by-q random-effects design matrix.
b is a q-by-1 random-effects vector.
δ is a model offset vector.
The model for the mean response μ is
where g-1 is inverse of the link function g(μ), and is the linear predictor of the fixed and random effects of the generalized linear mixed-effects model
A GLME model is parameterized by β, θ, and σ2.
The assumptions for generalized linear mixed-effects models are:
The random effects vector b has the prior distribution:
where σ2 is the dispersion parameter, and D is a symmetric and positive semidefinite matrix parameterized by an unconstrained parameter vector θ.
The observations yi are conditionally independent given b.
Prepare Data for Model Fitting
To fit a GLME model to your data, use
Format your input data using the
table data type.
Each row of the table represents one observation, and each column
represents one predictor variable. For more information on creating
table, see Create Tables and Assign Data to Them.
Input data can include continuous and grouping variables.
that predictors using the following data types are categorical:
Character vector or character array
Cell array of character vectors
If the input data table contains any
fitglme excludes that entire row of data
from the fit. To exclude additional rows of data, you can use the
pair argument of
fitglme when fitting the model.
Choose a Distribution Type for the Model
GLME models are used when the response data does not follow
a normal distribution. Therefore, when fitting a model using
fitglme, you must specify the response
distribution type using the
pair argument. Often, the type of response data suggests the appropriate
distribution type for the model.
|Type of Response Data||Suggested Response Distribution Type|
|Any real number|
|Any positive number|
|Any nonnegative integer|
|Integer from 0 to n, where n is a fixed positive value|
Choose a Link Function for the Model
GLME models use a link function, g, to map
the relationship between the mean response and the linear combination
of the predictors. By default,
a predefined, commonly accepted link function based on the specified
distribution of the response data, as shown in the following table.
However, you can specify a different link function from the list of
predefined functions, or define your own, using the
pair argument of
Canonical link for the normal distribution.
Canonical link for the Poisson distribution.
Canonical link for the binomial distribution.
|Scalar value |
A structure containing four fields whose values are function handles:
When fitting a model to data,
the canonical link function by default.
|Distribution||Default Link Function|
The link functions
'probit' are mainly useful for binomial models.
Specify the Model Formula
Model specification for
fitglme uses Wilkinson notation, which
is a character vector or string scalar of the form
'y ~ terms',
y is the response variable name, and
terms is written in the following notation.
|Wilkinson Notation||Factors in Standard Notation|
|Constant (intercept) term|
|Do not include |
Formulas include a constant (intercept) term by default. To
exclude a constant term from the model, include
For generalized linear mixed-effects models, the formula specification
is of the form
'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)',
the fixed-effects and the random-effects terms, respectively.
Suppose the input data table contains the following:
A response variable,
XJ, where J is the total number of predictor variables (including continuous and grouping variables).
gR, where R is the number of grouping variables.
The grouping variables in
gR can be categorical,
logical, character arrays, string arrays, or cell arrays of character
Then, in a formula of the form
'y ~ fixed + (random1|g1)
+ ... + (randomR|gR)', the term
to a specification of the fixed-effects design matrix
a specification of the random-effects design matrix
to grouping variable
g1, and similarly
a specification of the random-effects design matrix
to grouping variable
gR. You can express the
using Wilkinson notation as follows.
|Fixed effects for the intercept, |
|No intercept, with fixed effects for |
|A fixed effect for the intercept, plus a random effect for
the intercept for each level of the grouping variable |
|Random intercept model with a fixed slope.|
|Random intercept and slope, with possible correlation between
them. This formula is equivalent to |
|Independent random-effects terms for intercept and slope.|
|Random intercept model with independent main effects for |
For example, the sample data
simulated data from a manufacturing company that operates 50 factories
across the world. Each factory runs a batch process to create a finished
product. The company wants to decrease the number of defects in each
batch, so it developed a new manufacturing process. To test the effectiveness
of the new process, the company selected 20 of its factories at random
to participate in an experiment: Ten factories implemented the new
process, while the other ten continued to run the old process. In
each of the 20 factories, the company ran five batches (for a total
of 100 batches), and recorded data on processing time (
temp_dev), number of defects (
and a categorical variable indicating the raw materials supplier (
for each batch.
To determine whether the new process (represented by the predictor
newprocess) significantly reduces the
number of defects, fit a GLME model using
supplier as fixed-effects predictors. Include
a random-effects intercept grouped by
to account for quality differences that might exist due to factory-specific
variations. The response variable
defects has a
The number of defects can be modeled using a Poisson distribution
This corresponds to the generalized linear mixed-effects model
defectsij is the number of defects observed in the batch produced by factory i (where i = 1, 2, ..., 20) during batch j (where j = 1, 2, ..., 5).
μij is the mean number of defects corresponding to factory i during batch j.
supplier_Cij and supplier_Bij are dummy variables that indicate whether company
B, respectively, supplied the process chemicals for the batch produced by factory i during batch j.
bi ~ N(0,σb2) is a random-effects intercept for each factory i that accounts for factory-specific variation in quality.
Using Wilkinson notation, specify this model as:
'defects ~ 1 + newprocess + time_dev + temp_dev + supplier
To account for the Poisson distribution of the response variable,
when fitting the model using
'Distribution' name-value pair argument as
fitglme uses a log link function
for response variables with a Poisson distribution.
Display the Model
The output of the fitting function
information about generalized linear mixed-effects model.
mfr manufacturing experiment data,
fit a model using
supplier as fixed-effects predictors. Specify
the response distribution as Poisson, the link function as log, and
the fit method as Laplace.
load mfr glme = fitglme(mfr,... 'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)',... 'Distribution','Poisson','Link','log','FitMethod','Laplace',... 'DummyVarCoding','effects')
glme = Generalized linear mixed-effects model fit by ML Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log FitMethod Laplace Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue '(Intercept)' 1.4689 0.15988 9.1875 94 9.8194e-15 'newprocess' -0.36766 0.17755 -2.0708 94 0.041122 'time_dev' -0.094521 0.82849 -0.11409 94 0.90941 'temp_dev' -0.28317 0.9617 -0.29444 94 0.76907 'supplier_C' -0.071868 0.078024 -0.9211 94 0.35936 'supplier_B' 0.071072 0.07739 0.91836 94 0.36078 Lower Upper 1.1515 1.7864 -0.72019 -0.015134 -1.7395 1.5505 -2.1926 1.6263 -0.22679 0.083051 -0.082588 0.22473 Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate '(Intercept)' '(Intercept)' 'std' 0.31381 Group: Error Name Estimate 'sqrt(Dispersion)' 1
Model information table displays the
total number of observations in the sample data (100), the number
of fixed- and random-effects coefficients (6 and 20, respectively),
and the number of covariance parameters (1). It also indicates that
the response variable has a
the link function is
Log, and the fit method is
Formula indicates the model specification
using Wilkinson’s notation.
Model fit statistics table displays statistics
used to assess the goodness of fit of the model. This includes the
Akaike information criterion (
AIC), Bayesian information
BIC) values, log likelihood (
and deviance (
Fixed effects coefficients table indicates
fitglme returned 95% confidence intervals.
It contains one row for each fixed-effects predictor, and each column
contains statistics corresponding to that predictor. Column 1 (
contains the name of each fixed-effects coefficient, column 2 (
contains its estimated value, and column 3 (
contains the standard error of the coefficient. Column 4 (
contains the t-statistic for a hypothesis test
that the coefficient is equal to 0. Column 5 (
and column 6 (
pValue) contain the degrees of freedom
and p-value that correspond to the t-statistic,
respectively. The last two columns (
display the lower and upper limits, respectively, of the 95% confidence
interval for each fixed-effects coefficient.
Random effects covariance parameters displays
a table for each grouping variable (here, only
including its total number of levels (20), and the type and estimate
of the covariance parameter. Here,
fitglme returns the standard deviation of
the random effect associated with the factory predictor, which has
an estimated value of 0.31381. It also displays a table containing
the error parameter type (here, the square root of the dispersion
parameter), and its estimated value of 1.
The standard display generated by
not provide confidence intervals for the random-effects parameters.
To compute and display these values, use
Work with the Model
After you create a GLME model using
you can use additional functions to work with the model.
Inspect and Test Coefficients and Confidence Intervals
To extract estimates of the fixed- and random-effects coefficients, covariance parameters, design matrices, and related statistics:
fixedEffectsextracts estimated fixed-effects coefficients and related statistics from a fitted model. Related statistics include the standard error; the t-statistic, degrees of freedom, and p-value for a hypothesis test of whether each parameter is equal to 0; and the confidence intervals.
randomEffectsextracts estimated random-effects coefficients and related statistics from a fitted GLME model. Related statistics include the estimated empirical Bayes predictor (EBP) of each random effect, the square root of the conditional mean squared error of prediction (CMSEP) given the covariance parameters and the response; the t-statistic, estimated degrees of freedom, and p-value for a hypothesis test of whether each random effect is equal to 0; and the confidence intervals.
covarianceParametersextracts estimated covariance parameters and related statistics from a fitted GLME model. Related statistics include estimate of the covariance parameter, and the confidence intervals.
designMatrixextracts the fixed- and random-effects design matrices, or a specified subset thereof, from the fitted GLME model.
To conduct customized hypothesis tests for the significance of fixed- and random-effects coefficients, and to compute custom confidence intervals:
anovaperforms a marginal F-test (hypothesis test) on fixed-effects terms, to determine if all coefficients representing the fixed-effects terms are equal to 0. You can use
anovato test the combined significance of the coefficients of categorical predictors.
coefCIcomputes confidence intervals for fixed- and random-effects parameters from a fitted GLME model. By default,
fitglmecomputes 95% confidence intervals. Use
coefCIto compute the boundaries at a different confidence level.
coefTestperforms custom hypothesis tests on fixed-effects or random-effects vectors of a fitted generalized linear mixed-effects model. For example, you can specify contrast matrices.
Generate New Response Values and Refit Model
To generate new response values, including fitted, predicted, and random responses, based on the fitted GLME model:
fittedcomputes fitted response values using the original predictor values, and the estimated coefficient and parameter values from the fitted model.
predictcomputes the predicted conditional or marginal mean of the response using either the original predictor values or new predictor values, and the estimated coefficient and parameter values from the fitted model.
randomgenerates random responses from a fitted model.
refitcreates a new fitted GLME model, based on the original model and a new response vector.
Inspect and Visualize Residuals
To extract and visualize residuals from the fitted GLME model:
residualsextracts the raw or Pearson residuals from the fitted model. You can also specify whether to compute the conditional or marginal residuals.
plotResidualscreates plots using the raw or Pearson residuals from the fitted model, including:
A histogram of the residuals
A scatterplot of the residuals versus fitted values
A scatterplot of residuals versus lagged residuals