I'm not quite sure if I'm encountering a bug associated with GeneralizedLinearModel objects, or if this is simply a result of my shaky knowledge of math underlying this process. Hopefully someone can point me in the right direction...
I'm attempting to fit logistic regression models to several datasets via stepwise regression (xobs is a 3615 x n array, where n ranges from 2-6, and yobs is a 3615 x 1 vector):
mdl = GeneralizedLinearModel.stepwise(xobs, yobs>0, 'purequadratic', ... 'distribution', 'binomial', ... 'link', 'logit', ... 'criterion', 'aic');
For some of my datasets, this results in a regression model with a rank-deficient regression matrix.
Warning: Regression design matrix is rank deficient to within machine precision. > In TermsRegression>TermsRegression.checkDesignRank at 98 In GeneralizedLinearModel>GeneralizedLinearModel.stepwise at 1553
Here's an example of one such model:
Generalized Linear regression model: logit(H) ~ 1 + Smax + Smin + Tavg + Savg*Srange + Smax^2 + Smin^2 + Tavg^2 Distribution = Binomial
Estimated Coefficients: Estimate SE tStat pValue (Intercept) -30.528 10.947 -2.7887 0.0052926 Savg -0.54303 0.17605 -3.0845 0.0020389 Smax 0.95502 0.26295 3.632 0.00028129 Smin 0 0 NaN NaN Srange -0.56718 0.20256 -2.8 0.0051102 Tavg 1.5211 0.75819 2.0063 0.044827 Savg:Srange 0.051677 0.016717 3.0913 0.0019928 Smax^2 -0.027523 0.0078208 -3.5192 0.00043288 Smin^2 0.022062 0.0071608 3.0809 0.0020637 Tavg^2 -0.027077 0.013544 -1.9992 0.045585
3249 observations, 3240 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 26, p-value = 0.00104
With this model, several functions associated with GLMs, such as plotSlice and predict, throw errors. Is it by design that this occurs? I'm honestly not sure if the resulting model is mathematically sound, or if I need to manually add and/or remove terms manually to get an allowable model.
Any suggestions for working with models like this? Tests to run on the predictor matrix ( xobs ) to isolate potential troublemaker interactions? Or alternatively, is there a way to prevent the stepwise method from adding terms that would result in a rank-deficient regression matrix?