This example shows how to test for significant
differences between category (group) means using a *t*-test,
two-way ANOVA (analysis of variance), and ANOCOVA (analysis of covariance)
analysis.

The goal is determining if the expected miles per gallon for a car depends on the decade in which it was manufactured, or the location where it was manufactured.

```
load('carsmall')
unique(Model_Year)
```

ans = 70 76 82

The variable `MPG`

has miles per gallon measurements
on a sample of 100 cars. The variables `Model_Year`

and `Origin`

contain
the model year and country of origin for each car.

The first factor of interest is the decade of manufacture. There are three manufacturing years in the data.

Create an ordinal array named `Decade`

by merging
the observations from years `70`

and `76`

into
a category labeled `1970s`

, and putting the observations
from `82`

into a category labeled `1980s`

.

Decade = ordinal(Model_Year,{'1970s','1980s'},[],[70 77 82]); getlevels(Decade)

ans = 1970s 1980s

Draw a box plot of miles per gallon, grouped by the decade of manufacture.

```
figure()
boxplot(MPG,Decade)
title('Miles per Gallon, Grouped by Decade of Manufacture')
```

The box plot suggests that miles per gallon is higher in cars manufactured during the 1980s compared to the 1970s.

Compute the mean and variance of miles per gallon for each decade.

[xbar,s2,grp] = grpstats(MPG,Decade,{'mean','var','gname'})

xbar = 19.7857 31.7097 s2 = 35.1429 29.0796 grp = '1970s' '1980s'

This output shows that the mean miles per gallon in the 1980s
was `31.71`

, compared to `19.79`

in
the 1970s. The variances in the two groups are similar.

Conduct a two-sample *t*-test, assuming equal
variances, to test for a significant difference between the group
means. The hypothesis is

$$\begin{array}{l}{H}_{0}:{\mu}_{70}={\mu}_{80}\\ {H}_{A}:{\mu}_{70}\ne {\mu}_{80}.\end{array}$$

MPG70 = MPG(Decade=='1970s'); MPG80 = MPG(Decade=='1980s'); [h,p] = ttest2(MPG70,MPG80)

h = 1 p = 3.4809e-15

`1`

indicates
the null hypothesis is rejected at the default 0.05 significance level.
The p-value for the test is very small. There is sufficient evidence
that the mean miles per gallon in the 1980s differs from the mean
miles per gallon in the 1970s.The second factor of interest is the location of manufacture.
First, convert `Origin`

to a nominal array.

Location = nominal(Origin); tabulate(Location)

tabulate(Location) Value Count Percent France 4 4.00% Germany 9 9.00% Italy 1 1.00% Japan 15 15.00% Sweden 2 2.00% USA 69 69.00%

Combine the categories `France`

, `Germany`

, `Italy`

,
and `Sweden`

into a new category named `Europe`

.

Location = mergelevels(Location,{'France','Germany','Italy','Sweden'},... 'Europe'); tabulate(Location)

Value Count Percent Japan 15 15.00% USA 69 69.00% Europe 16 16.00%

Compute the mean miles per gallon, grouped by the location of manufacture.

[xbar,grp] = grpstats(MPG,Location,{'mean','gname'})

xbar = 31.8000 21.1328 26.6667 grp = 'Japan' 'USA' 'Europe'

This result shows that average miles per gallon is lowest for the sample of cars manufactured in the U.S.

Conduct a two-way ANOVA to test for differences in expected
miles per gallon between factor levels for `Decade`

and `Location`

.

The statistical model is

$$MP{G}_{ij}=\mu +{\alpha}_{i}+{\beta}_{j}+{\epsilon}_{ij},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2;j=1,2,3,$$

where *MPG _{ij}* is
the response, miles per gallon, for cars made in decade

The hypotheses to test are equality of decade effects,

$$\begin{array}{l}{H}_{0}:{\alpha}_{1}={\alpha}_{2}=0\\ {H}_{A}:at\text{\hspace{0.17em}}\text{\hspace{0.17em}}least\text{\hspace{0.17em}}\text{\hspace{0.17em}}one\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\alpha}_{i}\ne 0,\end{array}$$

and equality of location effects,

$$\begin{array}{l}{H}_{0}:{\beta}_{1}={\beta}_{2}={\beta}_{3}=0\\ {H}_{A}:\text{\hspace{0.17em}}at\text{\hspace{0.17em}}\text{\hspace{0.17em}}least\text{\hspace{0.17em}}\text{\hspace{0.17em}}one\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\beta}_{j}\ne 0.\end{array}$$

You can conduct a multiple-factor ANOVA using `anovan`

.

anovan(MPG,{Decade,Location},'varnames',{'Decade','Location'});

This output shows the results of the two-way ANOVA. The p-value
for testing the equality of decade effects is `2.88503e-18`

,
so the null hypothesis is rejected at the 0.05 significance level.
The p-value for testing the equality of location effects is `7.40416e-10`

,
so this null hypothesis is also rejected.

A potential confounder in this analysis is car weight. Cars
with greater weight are expected to have lower gas mileage. Include
the variable `Weight`

as a continuous covariate in
the ANOVA; that is, conduct an ANOCOVA analysis.

Assuming parallel lines, the statistical model is

$$MP{G}_{ijk}=\mu +{\alpha}_{i}+{\beta}_{j}+\gamma Weigh{t}_{ijk}+{\epsilon}_{ijk},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2;\text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,2,3;\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=1,\mathrm{...},100.$$

The difference between this model and the two-way
ANOVA model is the inclusion of the continuous predictor, *Weight _{ijk}*,
the weight for the

Add the continuous covariate as a third group in the second `anovan`

input
argument. Use the name-value pair argument `Continuous`

to
specify that `Weight`

(the third group) is continuous.

anovan(MPG,{Decade,Location,Weight},'Continuous',3,... 'varnames',{'Decade','Location','Weight'});

This output shows that when car weight is considered, there
is insufficient evidence of a manufacturing location effect (p-value
= `0.1044`

).

You can use the interactive `aoctool`

to explore
this result.

aoctool(Weight,MPG,Location);

This command opens three dialog boxes. In the ANOCOVA Prediction
Plot dialog box, select the **Separate Means** model.

This output shows that when you do not include `Weight`

in
the model, there are fairly large differences in the expected miles
per gallon among the three manufacturing locations. Note that here
the model does not adjust for the decade of manufacturing.

Now, select the **Parallel Lines** model.

When you include `Weight`

in the model, the
difference in expected miles per gallon among the three manufacturing
locations is much smaller.

`anovan`

| `aoctool`

| `boxplot`

| `grpstats`

| `nominal`

| `ordinal`

| `ttest2`

- Plot Data Grouped by Category
- Summary Statistics Grouped by Category
- Regression with Categorical Covariates

Was this topic helpful?