## How to fix a singular design matrix created by LinearModel during categorical interaction expansion?

### Giuseppe DiStefano (view profile)

on 19 Mar 2013

Hello all,

I am using the LinearModel class in R2012a to create a generalized linear model. I am specifying the model by way of a formula in Wilkinson notation, and I'm passing a Dataset struct with both numeric and categorical columns. The latter are indicated to LinearModel.fit() by way of the CategoricalVars option.

The problem is that one of the terms in my formula involves interactions between two categorical variables (e.g. 'A:B'), which when expanded via dummyvars internally and multiplied out, leads to the creation of columns in the design matrix that are all zeros. This of course leads to a singular design matrix.

Is there a simple way to tell MATLAB how to handle problematic categorical interactions, or at least remove them without breaking the linear model object? I'm surprised this is apparently unhandled (no warnings even), as the situation could easily come up.

many thanks

### Tom Lane (view profile)

on 19 Mar 2013

When I try an example like this I see:

```Warning: Regression design matrix is rank deficient to within machine precision.
```

The coefficients table has coefficients fixed at zero. The step method may remove the singular terms. The anova method can help reveal which terms don't have full degrees of freedom.

Can you elaborate on what you see?

Show 1 older comment
Giuseppe DiStefano

### Giuseppe DiStefano (view profile)

on 19 Mar 2013

On a related note, it doesn't seem that the LinearModel class in R2012a is doing expansion of categorical interactions correctly. It appears to be first computing dummyvar representations for each of the variables in the interaction, and then multiplying things out. But when using the all-zero code to represent one of the categories, this poses a problem for finding interactions, since it would generate many columns of all zeros. Matlab apparently knows this and omits them all together (in this case)!

Tom Lane

### Tom Lane (view profile)

on 19 Mar 2013

You can run

```lm = step(lm)
```

to use stepwise regression to add or remove terms based on their significance. There are 'Lower' and 'Upper' options to control the set of terms considered for adding and removing. It is possible for an interaction term to be significant and singular at the same time. This can happen when there are missing factor combinations, yet the ones present represent a significant improvement over the model without the interaction term.

You can remove a term directly:

```lm = removeTerms(lm,'a:b')
```
Giuseppe DiStefano

### Giuseppe DiStefano (view profile)

on 20 Mar 2013

The 'lower' and 'upper' options unfortunately don't allow one to specify particular terms that may be added. For example, considering some set of interactions, but not all, at each step. Or some linear terms, e.g. {A, B, D} but not C.

For missing factor combinations, it would be great to be able to control the behavior as far as keeping or removing singular terms is concerned. For example, as x2fx accepts catlevels, so that you can indicate that the mere absence of a level in a dataset doesn't necessarily imply non-existence.

x2fx() incidentally seems to have the same categorical interaction expansion problem.