MATLAB Examples

Perform N-Way ANOVA

This example shows how to perform N-way ANOVA on car data with mileage and other information on 406 cars made between 1970 and 1982.

Load the sample data.

load carbig

The example focusses on four variables. MPG is the number of miles per gallon for each of 406 cars (though some have missing values coded as NaN). The other three variables are factors: cyl4 (four-cylinder car or not), org (car originated in Europe, Japan, or the USA), and when (car was built early in the period, in the middle of the period, or late in the period).

Fit the full model, requesting up to three-way interactions and Type 3 sums-of-squares.

varnames = {'Origin';'4Cyl';'MfgDate'};
anovan(MPG,{org cyl4 when},3,3,varnames)
ans =

    0.0000
       NaN
    0.0000
    0.7032
    0.0001
    0.2072
    0.6990

Note that many terms are marked by a # symbol as not having full rank, and one of them has zero degrees of freedom and is missing a p-value. This can happen when there are missing factor combinations and the model has higher-order terms. In this case, the cross-tabulation below shows that there are no cars made in Europe during the early part of the period with other than four cylinders, as indicated by the 0 in tbl(2,1,1).

[tbl,chi2,p,factorvals] = crosstab(org,when,cyl4)
tbl(:,:,1) =

    82    75    25
     0     4     3
     3     3     4


tbl(:,:,2) =

    12    22    38
    23    26    17
    12    25    32


chi2 =

  207.7689


p =

   8.0973e-38


factorvals =

  3x3 cell array

    {'USA'   }    {'Early'}    {'Other'   }
    {'Europe'}    {'Mid'  }    {'Four'    }
    {'Japan' }    {'Late' }    {0x0 double}

Consequently it is impossible to estimate the three-way interaction effects, and including the three-way interaction term in the model makes the fit singular.

Using even the limited information available in the ANOVA table, you can see that the three-way interaction has a p-value of 0.699, so it is not significant.

Examine only two-way interactions.

[p,tbl2,stats,terms] = anovan(MPG,{org cyl4 when},2,3,varnames);
terms
terms =

     1     0     0
     0     1     0
     0     0     1
     1     1     0
     1     0     1
     0     1     1

Now all terms are estimable. The p-values for interaction term 4 (Origin*4Cyl) and interaction term 6 (4Cyl*MfgDate) are much larger than a typical cutoff value of 0.05, indicating these terms are not significant. You could choose to omit these terms and pool their effects into the error term. The output terms variable returns a matrix of codes, each of which is a bit pattern representing a term.

Omit terms from the model by deleting their entries from terms.

terms([4 6],:) = []
terms =

     1     0     0
     0     1     0
     0     0     1
     1     0     1

Run anovan again, this time supplying the resulting vector as the model argument. Also return the statistics required for multiple comparisons of factors.

[~,~,stats] = anovan(MPG,{org cyl4 when},terms,3,varnames)
stats = 

  struct with fields:

         source: 'anovan'
          resid: [1x406 double]
         coeffs: [18x1 double]
            Rtr: [10x10 double]
       rowbasis: [10x18 double]
            dfe: 388
            mse: 14.1056
    nullproject: [18x10 double]
          terms: [4x3 double]
        nlevels: [3x1 double]
     continuous: [0 0 0]
         vmeans: [3x1 double]
       termcols: [5x1 double]
     coeffnames: {18x1 cell}
           vars: [18x3 double]
       varnames: {3x1 cell}
       grpnames: {3x1 cell}
        vnested: []
            ems: []
          denom: []
        dfdenom: []
        msdenom: []
         varest: []
          varci: []
       txtdenom: []
         txtems: []
        rtnames: []

Now you have a more parsimonious model indicating that the mileage of these cars seems to be related to all three factors, and that the effect of the manufacturing date depends on where the car was made.

Perform multiple comparisons for Origin and Cylinder.

results = multcompare(stats,'Dimension',[1,2])
results =

    1.0000    2.0000   -5.4891   -3.8412   -2.1932    0.0000
    1.0000    3.0000   -4.4146   -2.7251   -1.0356    0.0001
    1.0000    4.0000   -9.9992   -8.5828   -7.1664    0.0000
    1.0000    5.0000  -14.0237  -12.4240  -10.8242    0.0000
    1.0000    6.0000  -12.8980  -11.3080   -9.7180    0.0000
    2.0000    3.0000   -0.7171    1.1160    2.9492    0.5085
    2.0000    4.0000   -7.3655   -4.7417   -2.1179    0.0000
    2.0000    5.0000   -9.9992   -8.5828   -7.1664    0.0000
    2.0000    6.0000   -9.7464   -7.4668   -5.1872    0.0000
    3.0000    4.0000   -8.5396   -5.8577   -3.1757    0.0000
    3.0000    5.0000  -12.0518   -9.6988   -7.3459    0.0000
    3.0000    6.0000   -9.9992   -8.5828   -7.1664    0.0000
    4.0000    5.0000   -5.4891   -3.8412   -2.1932    0.0000
    4.0000    6.0000   -4.4146   -2.7251   -1.0356    0.0001
    5.0000    6.0000   -0.7171    1.1160    2.9492    0.5085