## Rank deficiency issues with GeneralizedLinearModel

### Kelly Kearney (view profile)

on 28 Jun 2013

I'm not quite sure if I'm encountering a bug associated with GeneralizedLinearModel objects, or if this is simply a result of my shaky knowledge of math underlying this process. Hopefully someone can point me in the right direction...

I'm attempting to fit logistic regression models to several datasets via stepwise regression (xobs is a 3615 x n array, where n ranges from 2-6, and yobs is a 3615 x 1 vector):

```mdl = GeneralizedLinearModel.stepwise(xobs, yobs>0, 'purequadratic', ...
'distribution', 'binomial', ...
'criterion', 'aic');
```

For some of my datasets, this results in a regression model with a rank-deficient regression matrix.

```Warning: Regression design matrix is rank deficient to within machine precision.
> In TermsRegression>TermsRegression.checkDesignRank at 98
In GeneralizedLinearModel>GeneralizedLinearModel.stepwise at 1553
```

Here's an example of one such model:

```Generalized Linear regression model:
logit(H) ~ 1 + Smax + Smin + Tavg + Savg*Srange + Smax^2 + Smin^2 + Tavg^2
Distribution = Binomial
```
```Estimated Coefficients:
Estimate     SE           tStat      pValue
(Intercept)      -30.528       10.947    -2.7887     0.0052926
Savg            -0.54303      0.17605    -3.0845     0.0020389
Smax             0.95502      0.26295      3.632    0.00028129
Smin                   0            0        NaN           NaN
Srange          -0.56718      0.20256       -2.8     0.0051102
Tavg              1.5211      0.75819     2.0063      0.044827
Savg:Srange     0.051677     0.016717     3.0913     0.0019928
Smax^2         -0.027523    0.0078208    -3.5192    0.00043288
Smin^2          0.022062    0.0071608     3.0809     0.0020637
Tavg^2         -0.027077     0.013544    -1.9992      0.045585
```
```3249 observations, 3240 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 26, p-value = 0.00104
```

With this model, several functions associated with GLMs, such as plotSlice and predict, throw errors. Is it by design that this occurs? I'm honestly not sure if the resulting model is mathematically sound, or if I need to manually add and/or remove terms manually to get an allowable model.

Any suggestions for working with models like this? Tests to run on the predictor matrix ( xobs ) to isolate potential troublemaker interactions? Or alternatively, is there a way to prevent the stepwise method from adding terms that would result in a rank-deficient regression matrix?

## Products

No products are associated with this question.

#### Join the 15-year community celebration.

Play games and win prizes!

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi