Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Rank deficiency issues with GeneralizedLinearModel

Asked by Kelly Kearney on 28 Jun 2013

I'm not quite sure if I'm encountering a bug associated with GeneralizedLinearModel objects, or if this is simply a result of my shaky knowledge of math underlying this process. Hopefully someone can point me in the right direction...

I'm attempting to fit logistic regression models to several datasets via stepwise regression (xobs is a 3615 x n array, where n ranges from 2-6, and yobs is a 3615 x 1 vector):

mdl = GeneralizedLinearModel.stepwise(xobs, yobs>0, 'purequadratic', ...
      'distribution', 'binomial', ...
      'link', 'logit', ...
      'criterion', 'aic');

For some of my datasets, this results in a regression model with a rank-deficient regression matrix.

Warning: Regression design matrix is rank deficient to within machine precision. 
> In TermsRegression>TermsRegression.checkDesignRank at 98
In GeneralizedLinearModel>GeneralizedLinearModel.stepwise at 1553

Here's an example of one such model:

Generalized Linear regression model:
  logit(H) ~ 1 + Smax + Smin + Tavg + Savg*Srange + Smax^2 + Smin^2 + Tavg^2
  Distribution = Binomial
Estimated Coefficients:
                 Estimate     SE           tStat      pValue    
  (Intercept)      -30.528       10.947    -2.7887     0.0052926
  Savg            -0.54303      0.17605    -3.0845     0.0020389
  Smax             0.95502      0.26295      3.632    0.00028129
  Smin                   0            0        NaN           NaN
  Srange          -0.56718      0.20256       -2.8     0.0051102
  Tavg              1.5211      0.75819     2.0063      0.044827
  Savg:Srange     0.051677     0.016717     3.0913     0.0019928
  Smax^2         -0.027523    0.0078208    -3.5192    0.00043288
  Smin^2          0.022062    0.0071608     3.0809     0.0020637
  Tavg^2         -0.027077     0.013544    -1.9992      0.045585
3249 observations, 3240 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 26, p-value = 0.00104

With this model, several functions associated with GLMs, such as plotSlice and predict, throw errors. Is it by design that this occurs? I'm honestly not sure if the resulting model is mathematically sound, or if I need to manually add and/or remove terms manually to get an allowable model.

Any suggestions for working with models like this? Tests to run on the predictor matrix ( xobs ) to isolate potential troublemaker interactions? Or alternatively, is there a way to prevent the stepwise method from adding terms that would result in a rank-deficient regression matrix?

0 Comments

Kelly Kearney

Products

No products are associated with this question.

0 Answers

Contact us