N-way analysis of variance
returns
a vector of p-values for multiway (n-way)
ANOVA using additional options specified by one or more p
= anovan(y
,group
,Name,Value
)Name,Value
pair
arguments.
For example, you can specify which predictor variable is continuous, if any, or the type of sum of squares to use.
[
returns a p
,tbl
,stats
]
= anovan(___)stats
structure
that you can use to perform a multiple comparison test, which
enables you to determine which pairs of group means are significantly
different. You can perform such a test using the multcompare
function by providing the stats
structure
as input.
Load the sample data.
y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};
y
is the response vector and g1
, g2
, and g3
are the grouping variables (factors). Each factor has two levels ,and every observation in y
is identified by a combination of factor levels. For example, observation y(1)
is associated with level 1 of factor g1
, level 'hi'
of factor g2
, and level 'may'
of factor g3
. Similarly, observation y(6)
is associated with level 2 of factor g1
, level 'hi'
of factor g2
, and level 'june'
of factor g3
.
Test if the response is the same for all factor levels.
p = anovan(y,{g1,g2,g3})
p = 0.4174 0.0028 0.9140
In the ANOVA table, X1
, X2
, and X3
correspond to the factors g1
, g2
, and g3
, respectively. The p-value 0.4174 indicates that the mean responses for levels 1 and 2 of the factor g1
are not significantly different. Similarly, the p-value 0.914 indicates that the mean responses for levels 'hi'
and 'lo'
of the factor g3
are not significantly different. However, the p-value 0.0028 is small enough to conclude that the mean responses are significantly different for the two levels, 'may'
and 'june'
, of the factor g3
. By default, anovan
computes p-values just for the three main effects.
Test the two-factor interactions. This time specify the variable names.
p = anovan(y,{g1 g2 g3},'model','interaction','varnames',{'g1','g2','g3'})
p = 0.0347 0.0048 0.2578 0.0158 0.1444 0.5000
The interaction terms are represented by g1*g2
, g1*g3
, and g2*g3
in the ANOVA table. The first three entries of p
are the p-values for the main effects. The last three entries are the p-values for the two-way interactions. The p-value of 0.0158 indicates that the interaction between g1
and g2
is significant. The p-values of 0.1444 and 0.5 indicate that the corresponding interactions are not significant.
Load the sample data.
load carbig
The data has measurements on 406 cars. The variable org
shows where the cars were made and when
shows when in the year the cars were manufactured.
Study how the mileage depends on when and where the cars were made. Also include the two-way interactions in the model.
p = anovan(MPG,{org when},'model',2,'varnames',{'origin','mfg date'})
p = 0.0000 0.0000 0.3059
The 'model',2
name-value pair argument represents the two-way interactions. The p-value for the interaction term, 0.3059, is not small, indicating little evidence that the effect of the time of manufacture (mfg date
) depends on where the car was made (origin
). The main effects of origin and manufacturing date, however, are significant, both p-values are 0.
Load the sample data.
y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};
y
is the response vector and g1
, g2
, and g3
are the grouping variables (factors). Each factor has two levels, and every observation in y
is identified by a combination of factor levels. For example, observation y(1)
is associated with level 1 of factor g1
, level 'hi'
of factor g2
, and level 'may'
of factor g3
. Similarly, observation y(6)
is associated with level 2 of factor g1
, level 'hi'
of factor g2
, and level 'june'
of factor g3
.
Test if the response is the same for all factor levels. Also compute the statistics required for multiple comparison tests.
[~,~,stats] = anovan(y,{g1 g2 g3},'model','interaction',... 'varnames',{'g1','g2','g3'});
The p-value of 0.2578 indicates that the mean responses for levels 'may'
and 'june'
of factor g3
are not significantly different. The p-value of 0.0347 indicates that the mean responses for levels 1
and 2
of factor g1
are significantly different. Similarly, the p-value of 0.0048 indicates that the mean responses for levels 'hi'
and 'lo'
of factor g2
are significantly different.
Perform multiple comparison tests to find out which groups of the factors g1
and g2
are significantly different.
results = multcompare(stats,'Dimension',[1 2])
results = 1.0000 2.0000 -6.8604 -4.4000 -1.9396 0.0280 1.0000 3.0000 4.4896 6.9500 9.4104 0.0177 1.0000 4.0000 6.1396 8.6000 11.0604 0.0143 2.0000 3.0000 8.8896 11.3500 13.8104 0.0108 2.0000 4.0000 10.5396 13.0000 15.4604 0.0095 3.0000 4.0000 -0.8104 1.6500 4.1104 0.0745
multcompare
compares the combinations of groups (levels) of the two grouping variables, g1
and g2
. In the results
matrix, the number 1 corresponds to the combination of level 1
of g1
and level hi
of g2
, the number 2 corresponds to the combination of level 2
of g1
and level hi
of g2
. Similarly, the number 3 corresponds to the combination of level 1
of g1
and level lo
of g2
, and the number 4 corresponds to the combination of level 2
of g1
and level lo
of g2
. The last column of the matrix contains the p-values.
For example, the first row of the matrix shows that the combination of level 1
of g1
and level hi
of g2
has the same mean response values as the combination of level 2
of g1
and level hi
of g2
. The p-value corresponding to this test is 0.0280, which indicates that the mean responses are significantly different. You can also see this result in the figure. The blue bar shows the comparison interval for the mean response for the combination of level 1
of g1
and level hi
of g2
. The red bars are the comparison intervals for the mean response for other group combinations. None of the red bars overlap with the blue bar, which means the mean response for the combination of level 1
of g1
and level hi
of g2
is significantly different from the mean response for other group combinations.
You can test the other groups by clicking on the corresponding comparison interval for the group. The bar you click on turns to blue. The bars for the groups that are significantly different are red. The bars for the groups that are not significantly different are gray. For example, if you click on the comparison interval for the combination of level 1
of g1
and level lo
of g2
, the comparison interval for the combination of level 2
of g1
and level lo
of g2
overlaps, and is therefore gray. Conversely, the other comparison intervals are red, indicating significant difference.
y
— Sample datanumeric vectorSample data, specified as a numeric vector.
Data Types: single
| double
group
— Grouping variablescell arrayGrouping variables, i.e. the factors and factor levels of the
observations in y
, specified as a cell array.
Each of the cells in group
contains a list of
factor levels identifying the observations in y
with
respect to one of the factors. The list within each cell can be a
categorical array, numeric vector, character matrix, or single-column
cell array of strings, and must have the same number of elements as y
.
$$\begin{array}{ccccccccccc}y& =& [& {y}_{1},& {y}_{2},& {y}_{3},& {y}_{4},& {y}_{5},& \cdots ,& {y}_{N}& {]}^{\prime}\\ & & & \uparrow & \uparrow & \uparrow & \uparrow & \uparrow & & \uparrow & \\ g1& =& \{& \text{'}A\text{'},& \text{'}A\text{'},& \text{'}C\text{'},& \text{'}B\text{'},& \text{'}B\text{'},& \cdots ,& \text{'}D\text{'}& \}\\ g2& =& [& 1& 2& 1& 3& 1& \cdots ,& 2& ]\\ g3& =& \{& \text{'}\text{hi}\text{'},& \text{'}\text{mid}\text{'},& \text{'}\text{low}\text{'},& \text{'}\text{mid}\text{'},& \text{'}\text{hi}\text{'},& \cdots ,& \text{'}\text{low}\text{'}& \}\end{array}$$
By default, anovan
treats all grouping
variables as fixed effects.
For example, in a study you want to investigate the effects of gender, school, and the education method on the academic success of elementary school students, then you can specify the grouping variables as follows.
Example: {'Gender','School','Method'}
Data Types: cell
Specify optional comma-separated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'alpha',0.01,'model','interaction','sstype',2
specifies anovan
to
compute the 99% confidence bounds and p-values for the main effects
and two-way interactions using type II sum of squares.'alpha'
— Confidence level0.05 (default) | scalar value in the range 0 to 1Confidence level for confidence bounds, specified as the comma-separated
pair consisting of'alpha'
and a scalar value in
the range 0 to 1. For a value α, the confidence level is 100*(1–α)%.
Example: 'alpha',0.01
corresponds to 99% confidence
intervals
Data Types: single
| double
'continuous'
— Indicator for continuous predictorsvector of indicesIndicator for continuous predictors, representing which grouping
variables should be treated as continuous predictors rather than as
categorical predictors, specified as the comma-separated pair consisting
of'continuous'
and a vector of indices.
For example, if there are three grouping variables and second one is continuous, then you can specify as follows.
Example: continuous',[2]
Data Types: single
| double
'display'
— Indicator to display ANOVA table'on'
(default) | 'off'
Indicator to display ANOVA table, specified as the comma-separated
pair consisting of 'display'
and 'on'
or 'off'
.
When displayopt
is 'off'
, anova1
only
returns the output arguments, and does not display the standard ANOVA
table as a figure.
Example: 'display','off'
'model'
— Type of the model'linear'
(default) | 'interaction'
| 'full'
| integer value | terms matrixType of the model, specified as the comma-separated pair consisting
of 'model'
and one of the following:
'linear'
— The default 'linear'
model
computes only the p-values for the null hypotheses
on the N main effects.
'interaction'
— The 'interaction'
model
computes the p-values for null hypotheses on the N main
effects and the $$\left(\begin{array}{c}N\\ 2\end{array}\right)$$ two-factor interactions.
'full'
— The 'full'
model
computes the p-values for null hypotheses on the N main
effects and interactions at all levels.
An integer — For an integer value of k,
(k ≤ N) for model type, anovan
computes
all interaction levels through the kth level. For
example, the value 3 means main effects plus two- and three-factor
interactions. The values k = 1 and k =
2 are equivalent to the 'linear'
and 'interaction'
specifications,
respectively. The value k = N is
equivalent to the 'full'
specification.
Terms matrix — A matrix of term definitions
having the same form as the input to the x2fx
function.
All entries must be 0
or 1
(no
higher powers).
For more precise control over the main and interaction terms
that anovan
computes, you can specify a matrix
containing one row for each main or interaction term to include in
the ANOVA model. Each row defines one term using a vector of N zeros
and ones. The table below illustrates the coding for a 3-factor ANOVA
for factors A, B, and C.
Matrix Row | ANOVA Term |
---|---|
| Main term A |
| Main term B |
| Main term C |
| Interaction term AB |
| Interaction term AC |
| Interaction term BC |
| Interaction term ABC |
For example, if there are three factors A, B,
and C, and 'model',[0 1 0;0 0 1;0 1 1]
,
then anovan
tests for the main effects B and C,
and the interaction effect BC, respectively.
A simple way to generate the terms matrix is to modify the terms
output,
which codes the terms in the current model using the format described
above. If anovan
returns [0 1 0;0 0 1;0
1 1]
for terms
, for example, and there
is no significant interaction BC, then you can
recompute ANOVA on just the main effects B and C by
specifying [0 1 0;0 0 1]
for model
.
Example: 'model',[0 1 0;0 0 1;0 1 1]
Example: 'model','interaction'
'nested'
— Nesting relationshipsmatrix of 0's and 1'sNesting relationships among the grouping variables, specified
as the comma-separated pair consisting of 'nested'
and
a matrix M of 0's and 1's, i.e.M(i,j)
= 1 if variable i is nested in variable j.
For example, if there are two grouping variables District and School, where School is nested in District, then you can express this relationship as follows.
Example: 'nested',[0,0;1 0]
Data Types: single
| double
'random'
— Indicator for random variablesvector of indicesIndicator for random variables, representing which grouping
variables are random, specified as the comma-separated pair consisting
of 'random'
and a vector of indices. By default, anovan
treats
all grouping variables as fixed.
anovan
treats an interaction term as random
if any of the variables in the interaction term is random.
Example: 'random',[3]
Data Types: single
| double
'sstype'
— Type of sum of squares3 (default) | 1 | 2 | hType of sum squares, specified as the comma-separated pair consisting
of 'sstype'
and the following:
1 — Type I sum of squares. The reduction in residual sum of squares obtained by adding that term to a fit that already includes the terms listed before it.
2 — Type II sum of squares. The reduction in residual sum of squares obtained by adding that term to a model consisting of all other terms that do not contain the term in question.
3 — Type III sum of squares. The reduction in residual sum of squares obtained by adding that term to a model containing all other terms, but with their effects constrained to obey the usual "sigma restrictions" that make models estimable.
h — Hierarchical model. Similar to type 2, but with continuous as well as categorical factors used to determine the hierarchy of terms.
The sum of squares for any term is determined by comparing two
models. For a model containing main effects but no interactions, the
value of sstype
only influences computations
on unbalanced data.
Suppose you are fitting a model with two factors and their interaction, and that the terms appear in the order A, B, AB. Let R(·) represent the residual sum of squares for a model, so for example R(A, B, AB) is the residual sum of squares fitting the whole model, R(A) is the residual sum of squares fitting just the main effect of A, and R(1) is the residual sum of squares fitting just the mean. The three types of sums of squares are as follows:
Term | Type 1 Sum of Squares | Type 2 Sum of Squares | Type 3 Sum of Squares |
---|---|---|---|
A | R(1)–R(A) | R(B)– R(A, B) | R(B, AB) – R(A, B, AB) |
B | R(A)– R(A, B) | R(A)– R(A, B) | R(A, AB) – R(A, B, AB) |
AB | R(A, B) – R(A, B, AB) | R(A, B) – R(A, B, AB) | R(A, B) – R(A, B, AB) |
The models for Type 3 sum of squares have sigma restrictions imposed. This means, for example, that in fitting R(B, AB), the array of AB effects is constrained to sum to 0 over A for each value of B, and over B for each value of A.
Example: 'sstype','h'
Data Types: single
| double
'varnames'
— Names of grouping variablesX1,X2,...,XN
(default) | character matrix | cell array of stringsNames of grouping variables, specified as the comma-separating
pair consisting of 'varnames'
and a character matrix
or a cell array of strings.
Example: 'varnames',{'Gender','City'}
Data Types: char
| cell
p
— p-valuesvectorp-values, returned as a vector.
Output vector p
contains p-values
for the null hypotheses on the N main effectsand
any interaction terms specified. Element p(1)
contains
the p-value for the null hypotheses that samples
at all levels of factor A are drawn from the same
population; element p(2)
contains the p-value
for the null hypotheses that samples at all levels of factor B are
drawn from the same population; and so on.
For example, if there are three factors A, B,
and C, and 'model',[0 1 0;0 0 1;0 1 1]
,
then the output vector p
contains the p-values
for the null hypotheses on the main effects B and C and
the interaction effect BC, respectively.
A sufficiently small p-value corresponding to a factor suggests that at least one group mean is significantly different from the other group means; that is, there is a main effect due to that factor. It is common to declare a result significant if the p-value is less than 0.05 or 0.01.
tbl
— ANOVA tablecell arrayANOVA table, returned as a cell array. The ANOVA table has seven columns:
Column name | Definition |
---|---|
source | The source of the variability. |
SS | The sum of squares due to each source. |
df | The degrees of freedom associated with each source. |
MS | The mean squares for each source, which is the ratio SS/df . |
F | F-statistic, which is the ratio of the mean squares. |
Prob>F | The p-values , which is the probability
that the F-statistic can take a value larger than
a computed test-statistic value. anovan derives
these probabilities from the cdf of F-distribution. |
stats
— StatisticsstructureStatistics to use in a multiple comparison test using
the multcompare
function, returned
as a structure.
anovan
evaluates the hypothesis that the
different groups (levels) of a factor (or more generally, a term)
have the same effect, against the alternative that they do not all
have the same effect. Sometimes it is preferable to perform a test
to determine which pairs of levels are significantly different, and
which are not. Use the multcompare
function
to perform such tests by supplying the stats
structure
as input.
The stats
structure contains the fields listed
below, in addition to a number of other fields required for doing
multiple comparisons using the multcompare
function:
Field | Description |
---|---|
| Estimated coefficients |
| Name of term for each coefficient |
| Matrix of grouping variable values for each term |
| Residuals from the fitted model |
The stats
structure also contains the following
fields if there are random effects:
Field | Description |
---|---|
| Expected mean squares |
| Denominator definition |
| Names of random terms |
| Variance component estimates (one per random term) |
| Confidence intervals for variance components |
terms
— Main and interaction termsmatrixMain and interaction terms, returned as a matrix. The terms
are encoded in the output matrix terms
using
the same format described above for input model
.
When you specify model
itself in this format,
the matrix returned in terms
is identical.
anova1
| anova2
| multcompare