devianceTest

Class: GeneralizedLinearModel

Analysis of deviance

Syntax

tbl = devianceTest(mdl)

Description

tbl = devianceTest(mdl) returns an analysis of deviance table for the mdl generalized linear model. tbl gives the result of a test of whether the fitted model fits significantly better than a constant model.

Input Arguments

mdl

Generalized linear model, as constructed by fitglm or stepwiseglm.

Output Arguments

tbl

Table containing two rows and four columns.

  • The first row relates to a constant model.

  • The second row relates to the full model in mdl.

  • The columns are:

    DevianceDeviance is twice the difference between the log likelihoods of the corresponding model (mdl or constant) and the saturated model. The test statistic for the deviance test is twice the difference between the log likelihoods of the tested model mdl and the constant model. For more information, see Deviance.
    DFEError degrees of freedom. It is the number of observations minus the number of parameters in the corresponding model.
    chi2StatF statistic or Chi-squared statistic, depending on whether the dispersion is estimated (F statistic) or not (Chi-squared statistic)
    • Chi-squared statistic is the difference between the deviance of the constant model and the deviance of the full model.

    • F statistic is the difference between the deviance of the constant model and the deviance of the full model, divided by the estimated dispersion.

    pValuep-value associated with the test. It is the Chi-squared statistic with (number of coefficients in the model minus one) degrees of freedom, or F statistic with (number of coefficients in the model minus one) numerator degrees of freedom, and DFE denominator degrees of freedom.

Definitions

Deviance

Deviance of a model M1 is twice the difference between the loglikelihood of that model and the saturated model, MS. The saturated model is the model with the maximum number of parameters that can be estimated. For example, if there are n observations yi, i = 1, 2, ..., n, with potentially different values for XiTβ, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model. Then the deviance of model M1 is

2(logL(b1,y)logL(bS,y)),

where b1 are the estimated parameters for model M1 and bS are the estimated parameters for the saturated model. The deviance has a chi-square distribution with np degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in model M1.

If M1 and M2 are two different generalized linear models, then the fit of the models can be assessed by comparing the deviances D1 and D2 of these models. The difference of the deviances is

D=D2D1=2(logL(b2,y)logL(bS,y))+2(logL(b1,y)logL(bS,y))=2(logL(b2,y)logL(b1,y)).

Asymptotically, this difference has a chi-square distribution with degrees of freedom v equal to the number of parameters that are estimated in one model but fixed (typically at 0) in the other. That is, it is equal to the difference in the number of parameters estimated in M1 and M2. You can get the p-value for this test using 1 - chi2cdf(D,V), where D = D2D1.

Examples

expand all

Deviance Test

Perform a deviance test on a generalized linear model.

Construct a generalized linear model.

rng('default') % for reproducibility
X = randn(100,5);
mu = exp(X(:,[1 4 5])*[.4;.2;.3]);
y = poissrnd(mu);
mdl = fitglm(X,y,'linear','Distribution','poisson');

Test whether the model differs from a constant in a statistically significant way.

tbl = devianceTest(mdl)
tbl = 

                                           Deviance    DFE
                                           ________    ___

    log(y) ~ 1                             128.58      99 
    log(y) ~ 1 + x1 + x2 + x3 + x4 + x5    83.726      94 


                                           chi2Stat
                                           ________

    log(y) ~ 1                                     
    log(y) ~ 1 + x1 + x2 + x3 + x4 + x5    44.858  


                                             pValue  
                                           __________

    log(y) ~ 1                                       
    log(y) ~ 1 + x1 + x2 + x3 + x4 + x5    1.5502e-08

The p-value is very small, indicating that the model significantly differs from a constant.

Was this topic helpful?