anovan

N-way analysis of variance

Syntax

p = anovan(y,group)
p = anovan(y,group,param,val)
[p,table] = anovan(y,group,param,val)
[p,table,stats] = anovan(y,group,param,val)
[p,table,stats,terms] = anovan(y,group,param,val)

Description

p = anovan(y,group) performs multiway (n-way) analysis of variance (ANOVA) for testing the effects of multiple factors on the mean of the vector y. This test compares the variance explained by factors to the left over variance that cannot be explained. The factors and factor levels of the observations in y are assigned by the cell array group. Each of the cells in the cell array group contains a list of factor levels identifying the observations in y with respect to one of the factors. The list within each cell can be a categorical array, numeric vector, character matrix, or single-column cell array of strings, and must have the same number of elements as y. The fitted ANOVA model includes the main effects of each grouping variable. All grouping variables are treated as fixed effects by default. The result p is a vector of p-values, one per term. For an example, see Example of Three-Way ANOVA.

p = anovan(y,group,param,val) specifies one or more of the parameter name/value pairs described in the following table.

ParameterValue

'alpha'

A number between 0 and 1 requesting 100(1 – alpha)% confidence bounds (default 0.05 for 95% confidence)

'continuous'

A vector of indices indicating which grouping variables should be treated as continuous predictors rather than as categorical predictors.

'display'

'on' displays an ANOVA table (the default)

'off' omits the display

'model'

The type of model used. See Model Type for a description of this parameter.

'nested'

A matrix M of 0's and 1's specifying the nesting relationships among the grouping variables. M(i,j) is 1 if variable i is nested in variable j.

'random'

A vector of indices indicating which grouping variables are random effects (all are fixed by default). See ANOVA with Random Effects for an example of how to use 'random'.

'sstype'

1, 2, 3 (default), or h specifies the type of sum of squares. See Sum of Squares for a description of this parameter.

'varnames'

A character matrix or a cell array of strings specifying names of grouping variables, one per grouping variable. When you do not specify 'varnames', the default labels 'X1', 'X2', 'X3', ..., 'XN' are used. See ANOVA with Random Effects for an example of how to use 'varnames'.

[p,table] = anovan(y,group,param,val) returns the ANOVA table (including factor labels) in cell array table. (Copy a text version of the ANOVA table to the clipboard by using the Copy Text item on the Edit menu.)

[p,table,stats] = anovan(y,group,param,val) returns a stats structure that you can use to perform a follow-up multiple comparison test with the multcompare function. See The Stats StructureThe Stats Structure for more information.

[p,table,stats,terms] = anovan(y,group,param,val) returns the main and interaction terms used in the ANOVA computations. The terms are encoded in the output matrix terms using the same format described above for input 'model'. When you specify 'model' itself in this matrix format, the matrix returned in terms is identical.

Model Type

This section explains how to use the argument 'model' with the syntax:

[...] = anovan(y,group,'model',modeltype)

The argument modeltype, which specifies the type of model the function uses, can be any one of the following:

  • 'linear' — The default 'linear' model computes only the p-values for the null hypotheses on the N main effects.

  • 'interaction' — The 'interaction' model computes the p-values for null hypotheses on the N main effects and the (N2) two-factor interactions.

  • 'full' — The 'full' model computes the p-values for null hypotheses on the N main effects and interactions at all levels.

  • An integer — For an integer value of modeltype, k (kN), anovan computes all interaction levels through the kth level. For example, the value 3 means main effects plus two- and three-factor interactions. The values k = 1 and k = 2 are equivalent to the 'linear' and 'interaction' specifications, respectively, while the value k = N is equivalent to the 'full' specification.

  • A matrix of term definitions having the same form as the input to the x2fx function. All entries must be 0 or 1 (no higher powers).

For more precise control over the main and interaction terms that anovan computes, modeltype can specify a matrix containing one row for each main or interaction term to include in the ANOVA model. Each row defines one term using a vector of N zeros and ones. The table below illustrates the coding for a 3-factor ANOVA.

Matrix RowANOVA Term

[1 0 0]

Main term A

[0 1 0]

Main term B

[0 0 1]

Main term C

[1 1 0]

Interaction term AB

[1 0 1]

Interaction term AC

[0 1 1]

Interaction term BC

[1 1 1]

Interaction term ABC

For example, if modeltype is the matrix [0 1 0;0 0 1;0 1 1], the output vector p contains the p-values for the null hypotheses on the main effects B and C and the interaction effect BC, in that order. A simple way to generate the modeltype matrix is to modify the terms output, which codes the terms in the current model using the format described above. If anovan returns [0 1 0;0 0 1;0 1 1] for terms, for example, and there is no significant result for interaction BC, you can recompute the ANOVA on just the main effects B and C by specifying [0 1 0;0 0 1] for modeltype.

Sum of Squares

This section explains how to use the argument 'sstype' with the syntax:

[...] = anovan(y,group,'sstype',type)

This syntax computes the ANOVA using the type of sum of squares specified by type, which can be 1, 2, 3, or h. While the numbers 13 designate Type 1, Type 2, or Type 3 sum of squares, respectively, h represents a hierarchical model similar to type 2, but with continuous as well as categorical factors used to determine the hierarchy of terms. The default value is 3. For a model containing main effects but no interactions, the value of type only influences computations on unbalanced data.

The sum of squares for any term is determined by comparing two models. The Type 1 sum of squares for a term is the reduction in residual sum of squares obtained by adding that term to a fit that already includes the terms listed before it. The Type 2 sum of squares is the reduction in residual sum of squares obtained by adding that term to a model consisting of all other terms that do not contain the term in question. The Type 3 sum of squares is the reduction in residual sum of squares obtained by adding that term to a model containing all other terms, but with their effects constrained to obey the usual "sigma restrictions" that make models estimable.

Suppose you are fitting a model with two factors and their interaction, and that the terms appear in the order A, B, AB. Let R(·) represent the residual sum of squares for a model, so for example R(A, B, AB) is the residual sum of squares fitting the whole model, R(A) is the residual sum of squares fitting just the main effect of A, and R(1) is the residual sum of squares fitting just the mean. The three types of sums of squares are as follows:

TermType 1 Sum of SquaresType 2 Sum of SquaresType 3 Sum of Squares

A

R(1) – R(A)

R(B) – R(A, B)

R(B, AB) – R(A, B, AB)

B

R(A) – R(A, B)

R(A) – R(A, B)

R(A, AB) – R(A, B, AB)

AB

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

The models for Type 3 sum of squares have sigma restrictions imposed. This means, for example, that in fitting R(B, AB), the array of AB effects is constrained to sum to 0 over A for each value of B, and over B for each value of A.

Example of Three-Way ANOVA

As an example of three-way ANOVA, consider the vector y and group inputs below.

y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]';
g1 = [1 2 1 2 1 2 1 2]; 
g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; 
g3 = {'may';'may';'may';'may';'june';'june';'june';'june'}; 

This defines a three-way ANOVA with two levels of each factor. Every observation in y is identified by a combination of factor levels. If the factors are A, B, and C, then observation y(1) is associated with

  • Level 1 of factor A

  • Level 'hi' of factor B

  • Level 'may' of factor C

Similarly, observation y(6) is associated with

  • Level 2 of factor A

  • Level 'hi' of factor B

  • Level 'june' of factor C

To compute the ANOVA, enter

p = anovan(y,{g1 g2 g3})
p =
  0.4174
  0.0028
  0.9140

Output vector p contains p-values for the null hypotheses on the N main effects. Element p(1) contains the p value for the null hypotheses, H0A, that samples at all levels of factor A are drawn from the same population; element p(2) contains the p value for the null hypotheses, H0B, that samples at all levels of factor B are drawn from the same population; and so on.

If any p value is near zero, this casts doubt on the associated null hypothesis. For example, a sufficiently small p value for H0A suggests that at least one A-sample mean is significantly different from the other A-sample means; that is, there is a main effect due to factor A. You need to choose a bound for the p value to determine whether a result is statistically significant. It is common to declare a result significant if the p value is less than 0.05 or 0.01.

anovan also displays a figure showing the standard ANOVA table, which by default divides the variability of the data in x into

  • The variability due to differences between the levels of each factor accounted for in the model (one row for each factor)

  • The remaining variability not explained by any systematic source

The ANOVA table has six columns:

  • The first shows the source of the variability.

  • The second shows the sum of squares (SS) due to each source.

  • The third shows the degrees of freedom (df) associated with each source.

  • The fourth shows the mean squares (MS), which is the ratio SS/df.

  • The fifth shows the F statistics, which are the ratios of the mean squares.

  • The sixth shows the p-values for the F statistics.

The table is shown in the following figure:

Two-Factor Interactions

By default, anovan computes p-values just for the three main effects. To also compute p-values for the two-factor interactions, X1*X2, X1*X3, and X2*X3, add the name/value pair 'model', 'interaction' as input arguments.

p = anovan(y,{g1 g2 g3},'model','interaction')
p =
  0.0347
  0.0048
  0.2578
  0.0158
  0.1444
  0.5000

The first three entries of p are the p-values for the main effects. The last three entries are the p-values for the two-factor interactions. You can determine the order in which the two-factor interactions occur from the ANOVAN table shown in the following figure.

The Stats Structure

The anovan test evaluates the hypothesis that the different levels of a factor (or more generally, a term) have the same effect, against the alternative that they do not all have the same effect. Sometimes it is preferable to perform a test to determine which pairs of levels are significantly different, and which are not. Use the multcompare function to perform such tests by supplying the stats structure as input.

The stats structure contains the fields listed below, in addition to a number of other fields required for doing multiple comparisons using the multcompare function:

FieldDescription

coeffs

Estimated coefficients

coeffnames

Name of term for each coefficient

vars

Matrix of grouping variable values for each term

resid

Residuals from the fitted model

The stats structure also contains the following fields if there are random effects:

FieldDescription

ems

Expected mean squares

denom

Denominator definition

rtnames

Names of random terms

varest

Variance component estimates (one per random term)

varci

Confidence intervals for variance components

Examples

Perform Two-Way ANOVA shows how to use anova2 to analyze the effects of two factors on a response in a balanced design. For a design that is not balanced, use anovan instead.

The data in carbig.mat gives measurements on 406 cars. Use anonvan to study how the mileage depends on where and when the cars were made:

load carbig

p = anovan(MPG,{org when},'model',2,'sstype',3,...
           'varnames',{'Origin';'Mfg date'})
p =
      0
      0
    0.3059

The p value for the interaction term is not small, indicating little evidence that the effect of the year or manufacture (when) depends on where the car was made (org). The linear effects of those two factors, however, are significant.

References

[1] Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987.

Was this topic helpful?