# glmfit

Fit generalized linear regression model

## Syntax

## Description

specifies additional options using one or more name-value arguments. For example, you can
specify `b`

= glmfit(`X`

,`y`

,`distr`

,`Name,Value`

)`'Constant','off'`

to omit the constant term from the model.

## Examples

### Fit Generalized Linear Model with Probit Link

Fit a generalized linear regression model, and compute predicted (estimated) values for the predictor data using the fitted model.

Create a sample data set.

```
x = [2100 2300 2500 2700 2900 3100 ...
3300 3500 3700 3900 4100 4300]';
n = [48 42 31 34 31 21 23 23 21 16 17 21]';
y = [1 2 0 3 8 8 14 17 19 15 17 21]';
```

`x`

contains the predictor variable values. Each `y`

value is the number of successes in the corresponding number of trials in `n`

.

Fit a probit regression model for `y`

on `x`

.

b = glmfit(x,[y n],'binomial','Link','probit');

Compute the estimated number of successes.

yfit = glmval(b,x,'probit','Size',n);

Plot the observed success percent and estimated success percent versus the `x`

values.

plot(x,y./n,'o',x,yfit./n,'-')

### Fit Generalized Linear Model Using Custom Link Function

Define a custom link function and use it to fit a generalized linear regression model.

Load the sample data.

`load fisheriris`

The column vector `species`

contains iris flowers of three different species: setosa, versicolor, and virginica. The matrix `meas`

contains four types of measurements for the flowers, the length and width of sepals and petals in centimeters.

Define the predictor variables and response variable.

```
X = meas(51:end,:);
y = strcmp('versicolor',species(51:end));
```

Define a custom link function for a logit link function. Create three function handles that define the link function, the derivative of the link function, and the inverse link function. Store them in a cell array.

link = @(mu) log(mu./(1-mu)); derlink = @(mu) 1./(mu.*(1-mu)); invlink = @(resp) 1./(1+exp(-resp)); F = {link,derlink,invlink};

Fit a logistic regression model using `glmfit`

with the custom link function.

b = glmfit(X,y,'binomial','link',F)

`b = `*5×1*
42.6378
2.4652
6.6809
-9.4294
-18.2861

Fit a generalized linear model by using the built-in `logit`

link function, and compare the results.

b = glmfit(X,y,'binomial','link','logit')

`b = `*5×1*
42.6378
2.4652
6.6809
-9.4294
-18.2861

### Perform Deviance Test

Fit a generalized linear regression model that contains an intercept and linear term for each predictor. Perform a deviance test that determines whether the model fits significantly better than a constant model.

Generate sample data using Poisson random numbers with two underlying predictors `X(:,1)`

and `X(:,2)`

.

rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);

Fit a generalized linear regression model that contains an intercept and linear term for each predictor.

`[b,dev] = glmfit(X,y,'poisson');`

The second output argument `dev`

is a Deviance of the fit.

Fit a generalized linear regression model that contains only an intercept. Specify the predictor variable as a column of 1s, and specify `'Constant'`

as `'off'`

so that `glmfit`

does not include a constant term in the model.

[~,dev_noconstant] = glmfit(ones(100,1),y,'poisson','Constant','off');

Compute the difference between `dev_constant`

and `dev`

.

D = dev_noconstant - dev

D = 2.9533e+05

`D`

has a chi-square distribution with 2 degrees of freedom. The degrees of freedom equal the difference in the number of estimated parameters in the model corresponding to `dev`

and the number of estimated parameters in the constant model. Find the *p*-value for a deviance test.

p = 1 - chi2cdf(D,2)

p = 0

The small *p*-value indicates that the model differs significantly from a constant.

Alternatively, you can create a generalized linear regression model of Poisson data by using the `fitglm`

function. The model display includes the statistic (`Chi^2-statistic vs. constant model`

) and *p*-value.

mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson')

mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ ______ (Intercept) 1.0405 0.022122 47.034 0 x1 0.9968 0.003362 296.49 0 x2 1.987 0.0063433 313.24 0 100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0

You can also use the `devianceTest`

function with the fitted model object.

devianceTest(mdl)

`ans=`*2×4 table*
Deviance DFE chi2Stat pValue
__________ ___ __________ ______
log(y) ~ 1 2.9544e+05 99
log(y) ~ 1 + x1 + x2 107.4 97 2.9533e+05 0

## Input Arguments

`X`

— Predictor variables

numeric matrix

Predictor variables, specified as an *n*-by-*p*
numeric matrix, where *n* is the number of observations and
*p* is the number of predictor variables. Each column of
`X`

represents one variable, and each row represents one
observation.

By default, `glmfit`

includes a constant term in the model. Do
not add a column of 1s directly to `X`

. You can change the default
behavior of `glmfit`

by specifying the
`'Constant'`

name-value argument.

**Data Types: **`single`

| `double`

`y`

— Response variable

vector | matrix

Response variable, specified as a vector or matrix.

If

`distr`

is not`'binomial'`

, then`y`

must be an*n*-by-1 vector, where*n*is the number of observations. Each entry in`y`

is the response for the corresponding row of`X`

. The data type must be single or double.If

`distr`

is`'binomial'`

, then`y`

is an*n*-by-1 vector indicating success or failure at each observation, or an*n*-by-2 matrix whose first column indicates the number of successes for each observation and second column indicates the number of trials for each observation.

**Data Types: **`single`

| `double`

| `logical`

| `categorical`

`distr`

— Distribution of response variable

`'normal'`

(default) | `'binomial'`

| `'poisson'`

| `'gamma'`

| `'inverse gaussian'`

Distribution of the response variable, specified as one of the values in this table.

Value | Description |
---|---|

`'normal'` | Normal distribution (default) |

`'binomial'` | Binomial distribution |

`'poisson'` | Poisson distribution |

`'gamma'` | Gamma distribution |

`'inverse gaussian'` | Inverse Gaussian distribution |

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`b = glmfit(X,y,'normal','link','probit')`

specifies that the
distribution of the response is normal and instructs `glmfit`

to use the
probit link function.

`B0`

— Initial values for coefficient estimates

numeric vector

Initial values for the coefficient estimates, specified as a numeric vector. The default values are initial fitted values derived from the input data.

**Data Types: **`single`

| `double`

`Constant`

— Indicator for constant term

`'on'`

(default) | `'off'`

Indicator for the constant term (intercept) in the fit, specified as either
`'on'`

to include the constant term or `'off'`

to
remove it from the model.

`'on'`

(default) —`glmfit`

includes a constant term in the model and returns a (*p*+ 1)-by-1 vector of coefficient estimates`b`

, where*p*is the number of predictors in`X`

. The coefficient of the constant term is the first element of`b`

.`'off'`

—`glmfit`

omits the constant term and returns a*p*-by-1 vector of coefficient estimates`b`

.

**Example: **`'Constant','off'`

`EstDisp`

— Indicator to compute dispersion parameter

`'off'`

for `'binomial'`

and
`'poisson'`

distributions (default) | `'on'`

Indicator to compute a dispersion parameter for `'binomial'`

and
`'poisson'`

distributions, specified as `'on'`

or
`'off'`

.

Value | Description |
---|---|

`'on'` | Estimate a dispersion parameter when computing standard errors. The estimated dispersion parameter value is the sum of squared Pearson residuals divided by the degrees of freedom for error (DFE). |

`'off'` | Use the theoretical value of 1 when computing standard errors (default). |

The fitting function always estimates the dispersion for other distributions.

**Example: **`'EstDisp','on'`

`LikelihoodPenalty`

— Penalty for likelihood estimate

`"none"`

(default) | `"jeffreys-prior"`

Penalty for the likelihood estimate, specified as `"none"`

or
`"jeffreys-prior"`

.

`"none"`

—`glmfit`

does not apply a penalty to the likelihood estimate.`"jeffreys-prior"`

—`glmfit`

uses Jeffreys prior to penalize the likelihood estimate.

For logistic models, setting `LikelihoodPenalty`

to
`"jeffreys-prior"`

is called *Firth's
regression*. To reduce the coefficient estimate bias when you have a small
number of samples, or when you are performing binomial (logistic) regression on a
separable data set, set `LikelihoodPenalty`

to
`"jeffreys-prior"`

.

**Example: **`LikelihoodPenalty="jeffreys-prior"`

**Data Types: **`char`

| `string`

`Link`

— Link function

canonical link function (default) | scalar value | structure or cell array of custom link function

Link function to use in place of the canonical link function, specified as one of the built-in link functions in the following table or a custom link function.

Link Function Name | Link Function | Mean (Inverse) Function |
---|---|---|

`'identity'` (default for `'normal'`
distribution) | f(μ) =
μ | μ = Xb |

`'log'` (default for `'poisson'`
distribution) | f(μ) =
log(μ) | μ = exp(Xb) |

`'logit'` (default for `'binomial'`
distribution) | f(μ) =
log(μ/(1 – μ)) | μ = exp(Xb) / (1 +
exp(Xb)) |

`'probit'` | f(μ) =
Φ^{–1}(μ), where Φ is the
cumulative distribution function of the standard normal distribution | μ = Φ(Xb) |

| f(μ) =
log(–log(μ)) | μ = exp(–exp(Xb)) |

`'comploglog'` | f(μ) = log(–log(1 –
μ)) | μ = 1 – exp(–exp(Xb)) |

`'reciprocal'` (default for `'gamma'`
distribution) | f(μ) =
1/μ | μ = 1/(Xb) |

`p` (a number, default for the ```
'inverse
gaussian'
``` distribution with p = –2) | f(μ) =
μ^{p} | μ =
Xb^{1/p} |

The default `'Link'`

value is the canonical link
function, which depends on the distribution of the response variable specified by the
`distr`

argument.

You can specify a custom link function using a structure or cell array.

Structure with three fields. Each field of the structure (for example,

`S`

) holds a function handle that accepts a vector of inputs and returns a vector of the same size:`S.Link`

— Link function,*f*(*μ*) =`S.Link`

(*μ*)`S.Derivative`

— Derivative of the link function`S.Inverse`

— Inverse link function,*μ*=`S.Inverse`

(*Xb*)

Cell array of the form

`{FL FD FI}`

that defines the link function (`FL(mu)`

), the derivative of the link function (`FD = dFL(mu)/dmu`

), and the inverse link function (`FI = FL^(–1)`

). Each entry holds a function handle that accepts a vector of inputs and returns a vector of the same size.

The link function defines the relationship
*f*(*μ*) =
*X***b* between the mean response
*μ* and the linear combination of predictors
*X***b*.

**Example: **`'Link','probit'`

**Data Types: **`single`

| `double`

| `char`

| `string`

| `struct`

| `cell`

`Offset`

— Offset variable

[ ] (default) | numeric vector

Offset variable in the fit, specified as a numeric vector with the same length as
the response `y`

.

`glmfit`

uses `Offset`

as an additional
predictor with a coefficient value fixed at 1. In other words, the formula for fitting is

*f*(*μ*)```
= Offset +
```

,*X***b*

where *f* is the link function,
*μ* is the mean response, and
*X***b* is the linear combination of predictors
*X*. The `Offset`

predictor has coefficient
`1`

.

For example, consider a Poisson regression model. Suppose, for theoretical
reasons, the number of counts is to be proportional to a predictor
`A`

. By using the log link function and specifying
`log(A)`

as an offset, you can force the model to satisfy this
theoretical constraint.

**Data Types: **`single`

| `double`

`Options`

— Optimization options

`statset('``glmfit`

')

(default) | structure

`glmfit`

')Optimization options, specified as a structure. This argument determines the control
parameters for the iterative algorithm that `glmfit`

uses.

Create the `'Options'`

value by using the function `statset`

or by creating a structure array containing the fields and values described in this table.

Field Name | Value | Default Value |
---|---|---|

`Display` | Amount of information displayed by the algorithm `'off'` — Displays no information`'final'` — Displays the final output
| `'off'` |

`MaxIter` | Maximum number of iterations allowed, specified as a positive integer | `100` |

`TolX` | Termination tolerance for the parameters, specified as a positive scalar | `1e-6` |

You can also enter `statset('`

in the
Command Window to see the names and default values of the fields that
`glmfit`

')`glmfit`

accepts in the `'Options'`

name-value argument.

**Example: **`'Options',statset('Display','final','MaxIter',1000)`

specifies to display the final information of the iterative algorithm results, and change the maximum number of iterations allowed to 1000.

**Data Types: **`struct`

`Weights`

— Observation weights

`ones(n,1)`

(default) | *n*-by-1 vector of nonnegative scalar values

Observation weights, specified as an *n*-by-1 vector of
nonnegative scalar values, where *n* is the number of
observations.

**Data Types: **`single`

| `double`

## Output Arguments

`b`

— Coefficient estimates

numeric vector

Coefficient estimates, returned as a numeric vector.

If

`'Constant'`

is`'on'`

(default), then`glmfit`

includes a constant term in the model and returns a (*p*+ 1)-by-1 vector of coefficient estimates`b`

, where*p*is the number of predictors in`X`

. The coefficient of the constant term is the first element of`b`

.If

`'Constant'`

is`'off'`

, then`glmfit`

omits the constant term and returns a*p*-by-1 vector of coefficient estimates`b`

.

`dev`

— Deviance of fit

numeric value

Deviance of the fit, returned as a numeric value. The deviance is useful for comparing two models when one model is a special case of the other model. The difference between the deviance of the two models has a chi-square distribution with degrees of freedom equal to the difference in the number of estimated parameters between the two models.

For more information, see Deviance.

`stats`

— Model statistics

structure

Model statistics, returned as a structure with the following fields:

`beta`

— Coefficient estimates`b`

`dfe`

— Degrees of freedom for error`sfit`

— Estimated dispersion parameter`s`

— Theoretical or estimated dispersion parameter`estdisp`

— 0 when`'EstDisp'`

is`'off'`

and 1 when`'EstDisp'`

is`'on'`

`covb`

— Estimated covariance matrix for`b`

`se`

— Vector of standard errors of the coefficient estimates`b`

`coeffcorr`

— Correlation matrix for`b`

`t`

—*t*statistics for`b`

`p`

—*p*-values for`b`

`resid`

— Vector of residuals`residp`

— Vector of Pearson residuals`residd`

— Vector of deviance residuals`resida`

— Vector of Anscombe residuals

If you estimate a dispersion parameter for the binomial or Poisson distribution,
then `stats.s`

is equal to `stats.sfit`

. Also, the
elements of `stats.se`

differ by the factor `stats.s`

from their theoretical values.

## More About

### Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to a saturated model.

The deviance of a model *M*_{1} is twice the
difference between the loglikelihood of the model
*M*_{1} and the saturated model
*M*_{s}. A saturated model is a
model with the maximum number of parameters that you can estimate.

For example, if you have *n* observations
(*y*_{i},
*i* = 1, 2, ..., *n*) with potentially different
values for
*X*_{i}^{T}β,
then you can define a saturated model with *n* parameters. Let
*L*(*b*,*y*) denote the maximum value
of the likelihood function for a model with the parameters *b*. Then the
deviance of the model *M*_{1} is

$$-2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right),$$

where *b*_{1} and
*b*_{s} contain the
estimated parameters for the model *M*_{1} and the
saturated model, respectively. The deviance has a chi-squared distribution with *n* – *p* degrees of freedom, where *n* is the number of parameters
in the saturated model and *p* is the number of parameters in the model
*M*_{1}.

Assume you have two different generalized linear regression models
*M*_{1} and
*M*_{2}, and
*M*_{1} has a subset of the terms in
*M*_{2}. You can assess the fit of the models by
comparing their deviances *D*_{1} and
*D*_{2}. The difference of the deviances is

$$\begin{array}{l}D={D}_{2}-{D}_{1}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$$

Asymptotically, the difference *D* has a chi-squared distribution with
degrees of freedom *v* equal to the difference in the number of parameters
estimated in *M*_{1} and
*M*_{2}. You can obtain the
*p*-value for this test by using ```
1 —
chi2cdf(D,v)
```

.

Typically, you examine *D* using a model
*M*_{2} with a constant term and no predictors.
Therefore, *D* has a chi-squared distribution with *p* – 1 degrees of freedom. If the dispersion is estimated, the difference divided
by the estimated dispersion has an *F* distribution with *p* – 1 numerator degrees of freedom and *n* – *p* denominator degrees of freedom.

## Alternative Functionality

`glmfit`

is useful when you simply need the output arguments of the
function or when you want to repeat fitting a model multiple times in a loop. If you need to
investigate a fitted model further, create a generalized linear regression model object `GeneralizedLinearModel`

by using `fitglm`

or `stepwiseglm`

. A
`GeneralizedLinearModel`

object provides more features than
`glmfit`

.

Use the properties of

`GeneralizedLinearModel`

to investigate a fitted model. The object properties include information about the coefficient estimates, summary statistics, fitting method, and input data.Use the object functions of

`GeneralizedLinearModel`

to predict responses and to modify, evaluate, and visualize the generalized linear regression model.You can find the information in the output of

`glmfit`

using the properties and object functions of`GeneralizedLinearModel`

.Output of `glmfit`

Equivalent Values in `GeneralizedLinearModel`

`b`

See the `Estimate`

column of the`Coefficients`

property.`dev`

See the `Deviance`

property.`stats`

See the model display in the Command Window. You can find the statistics in the model properties (

`CoefficientCovariance`

,`Coefficients`

,`Dispersion`

,`DispersionEstimated`

, and`Residuals`

).The dispersion parameter in

of`stats`

.s`glmfit`

is the scale factor for the standard errors of coefficients, whereas the dispersion parameter in the`Dispersion`

property of a generalized linear model is the scale factor for the variance of the response. Therefore,`stats.s`

is the square root of the`Dispersion`

value.

## References

[1] Dobson, A. J. *An Introduction to Generalized
Linear Models*. New York: Chapman & Hall, 1990.

[2] McCullagh, P., and J. A. Nelder. *Generalized
Linear Models*. New York: Chapman & Hall, 1990.

[3] Collett, D. *Modeling Binary Data*. New
York: Chapman & Hall, 2002.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced before R2006a**

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)