How to fix a singular design matrix created by LinearModel during categorical interaction expansion?

3 views (last 30 days)
Hello all,
I am using the LinearModel class in R2012a to create a generalized linear model. I am specifying the model by way of a formula in Wilkinson notation, and I'm passing a Dataset struct with both numeric and categorical columns. The latter are indicated to LinearModel.fit() by way of the CategoricalVars option.
The problem is that one of the terms in my formula involves interactions between two categorical variables (e.g. 'A:B'), which when expanded via dummyvars internally and multiplied out, leads to the creation of columns in the design matrix that are all zeros. This of course leads to a singular design matrix.
Is there a simple way to tell MATLAB how to handle problematic categorical interactions, or at least remove them without breaking the linear model object? I'm surprised this is apparently unhandled (no warnings even), as the situation could easily come up.
many thanks

Answers (1)

Tom Lane
Tom Lane on 19 Mar 2013
When I try an example like this I see:
Warning: Regression design matrix is rank deficient to within machine precision.
The coefficients table has coefficients fixed at zero. The step method may remove the singular terms. The anova method can help reveal which terms don't have full degrees of freedom.
Can you elaborate on what you see?
  4 Comments
Tom Lane
Tom Lane on 19 Mar 2013
You can run
lm = step(lm)
to use stepwise regression to add or remove terms based on their significance. There are 'Lower' and 'Upper' options to control the set of terms considered for adding and removing. It is possible for an interaction term to be significant and singular at the same time. This can happen when there are missing factor combinations, yet the ones present represent a significant improvement over the model without the interaction term.
You can remove a term directly:
lm = removeTerms(lm,'a:b')
Giuseppe DiStefano
Giuseppe DiStefano on 20 Mar 2013
The 'lower' and 'upper' options unfortunately don't allow one to specify particular terms that may be added. For example, considering some set of interactions, but not all, at each step. Or some linear terms, e.g. {A, B, D} but not C.
For missing factor combinations, it would be great to be able to control the behavior as far as keeping or removing singular terms is concerned. For example, as x2fx accepts catlevels, so that you can indicate that the mere absence of a level in a dataset doesn't necessarily imply non-existence.
x2fx() incidentally seems to have the same categorical interaction expansion problem.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!