Why do I recieve NaN's in the ANOVA table when I try to analyze my data using the Statistics Toolbox?

136 views (last 30 days)
I recieve NaN's in the ANOVA table when I try to analyze my unbalanced data using the Statistics Toolbox.
I have tried various configurations of data but my results are always NaN's.
For example, if I have some unbalanced data (the problem can also occur with balanced data) and code snippet:
y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]';
g1 = [1 2 1 2 1 2 1 2];
g1= [1 2 1 2 1 1 1 3];
g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'};
g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};
p = anovan(y, {g1 g2 g3}, 'model', [1 0 0;0 1 0;0 0 1; 1 1 0])
The resulting P-value contains a NaN value.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 27 Jun 2009
ANOVAN does not fit regression models. It fits models of the form
y = a + b(i) + ...
where 'i' ranges over the values of variables, but not models like
y = a + b*G1 + ...
where G1 contains the variables and b is the coefficient to be estimated.
Thus, make sure that there are sufficient numbers of observations to estimate the error. There should be some degrees of freedom for the error so that the mean square due to the error is not close to zero. Values of mean square error closer to zero means that the F-values are really high (zero would produce NaN). Furthermore, there should be enough degrees of freedom to estimate the coefficients for the grouping variables. Try to increase the total number of observations.
In the example, we have a level 3 to g1. This level appears exactly once, so while it is possible to estimate the effect of the new level, it's not possible to estimate the interaction of g1 with g2. An interaction measures how the effect of g1 varies with the level of g2. Since we observe g1=3 for just a single value of g2, we have no information about what happens when g1=3 and g2 takes other values. In this case, it will be necessary to have an observation in which g1=3 and g2 takes other values.
The following modified code, with added observations, will work properly:
y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0 43.0 42.3 45.2]'; %11 data points now.
g1= [1 1 1 2 2 2 2 2 3 3 3];
g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo';'hi';'hi';'lo'};
g3 ={'may';'may';'may';'may';'june';'june';'june';'june';'may';'may';'june'};
p=anovan(y, {g1 g2 g3}, 'model', [1 0 0;0 1 0;0 0 1; 1 1 0])

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!