MATLAB Answers

regress and/or fitlm with more than 1000 dummies

2 views (last 30 days)
Sebastiano delre
Sebastiano delre on 6 Jul 2016
Commented: Brendan Hamm on 6 Jul 2016
I am trying to run a regression model on a dataset with about 600,000 observations and 1008 dummies. I am using fitlm but Matlab crashes or runs out of memory. I tried to save memory space defining the dummies as logical but without success. Do you think I still have some hope or should I just give up? Thank you for your help.


Sebastiano delre
Sebastiano delre on 6 Jul 2016
The dummies are the logical variables, with 0/1 values. I derive the dummies from a categorical variable with 1009 values. This is why I get 1008 dummies. In sum, the regression model includes 10 predictors, plus the 1008 dummies which are controls.
Brendan Hamm
Brendan Hamm on 6 Jul 2016
One thing you may consider is using fitlm with a table of predictors. In this manner you can simply have one of the columns be a categorical predictor variable and MATLAB will handle the dummy variables for you (including the dummy variable trap concern). There is no guarantee here that this will solve your problems, but I would consider it.
You are using almost 5 GB just for your data. For this reason computing things like the hat matrix will be computationally intense and require more data to be stored in memory. Furthermore fitlm stores a lot of extra data as well which means you may try another method of regression. polyfit could be helpful in that we don't have all of the extra statistics computed for us. Another option is to use a gradient based method (likely cgs) as the iterative algorithms take less computation at each step. This will also take a sparse matrix which can further reduce memory requirements.

Sign in to comment.

Answers (0)

Sign in to answer this question.