Asked by Giuseppe DiStefano
on 19 Mar 2013

Hello all,

I am using the LinearModel class in R2012a to create a generalized linear model. I am specifying the model by way of a formula in Wilkinson notation, and I'm passing a Dataset struct with both numeric and categorical columns. The latter are indicated to LinearModel.fit() by way of the CategoricalVars option.

The problem is that one of the terms in my formula involves interactions between two categorical variables (e.g. 'A:B'), which when expanded via dummyvars internally and multiplied out, leads to the creation of columns in the design matrix that are all zeros. This of course leads to a singular design matrix.

Is there a simple way to tell MATLAB how to handle problematic categorical interactions, or at least remove them without breaking the linear model object? I'm surprised this is apparently unhandled (no warnings even), as the situation could easily come up.

many thanks

*No products are associated with this question.*

Answer by Tom Lane
on 19 Mar 2013

When I try an example like this I see:

Warning: Regression design matrix is rank deficient to within machine precision.

The coefficients table has coefficients fixed at zero. The step method may remove the singular terms. The anova method can help reveal which terms don't have full degrees of freedom.

Can you elaborate on what you see?

Show 1 older comment

Giuseppe DiStefano
on 19 Mar 2013

Tom Lane
on 19 Mar 2013

You can run

lm = step(lm)

to use stepwise regression to add or remove terms based on their significance. There are 'Lower' and 'Upper' options to control the set of terms considered for adding and removing. It is possible for an interaction term to be significant and singular at the same time. This can happen when there are missing factor combinations, yet the ones present represent a significant improvement over the model without the interaction term.

You can remove a term directly:

lm = removeTerms(lm,'a:b')

Giuseppe DiStefano
on 20 Mar 2013

The 'lower' and 'upper' options unfortunately don't allow one to specify particular terms that may be added. For example, considering **some** set of interactions, but not all, at each step. Or some linear terms, e.g. {A, B, D} but not C.

For missing factor combinations, it would be great to be able to control the behavior as far as keeping or removing singular terms is concerned. For example, as x2fx accepts catlevels, so that you can indicate that the mere absence of a level in a dataset doesn't necessarily imply non-existence.

x2fx() incidentally seems to have the same categorical interaction expansion problem.

Opportunities for recent engineering grads.

## 0 Comments