Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Create dummy variables

`D = dummyvar(group)`

`D = dummyvar(group)`

returns
a matrix `D`

containing zeros and ones, whose columns
are dummy variables for the grouping variable `group`

.
Columns of `group`

represent categorical predictor
variables, with values indicating categorical levels. Rows of `group`

represent
observations across variables.

`group`

can be a numeric vector or categorical
column vector representing levels within a single variable, a cell
array containing one or more grouping variables, or a numeric matrix
or cell array of categorical column vectors representing levels within
multiple variables. If `group`

is a numeric vector
or matrix, values in any column must be positive integers in the range
from `1`

to the number of levels for the corresponding
variable. In this case, `dummyvars`

treats each column
as a separate numeric grouping variable. With multiple grouping variables,
the sets of dummy variable columns are in the same order as the grouping
variables in `group`

.

The order of the dummy variable columns in `D`

matches
the order of the groups defined by `group`

. When `group`

is
a categorical vector, the groups and their order match the output
of the `getlabels(group)`

method. When `group`

is
a numeric vector, `dummyvar`

assumes that the groups
and their order are `1:max(group)`

. In this respect, `dummyvars`

treats
a numeric grouping variable differently than `grp2idx`

.

If `group`

is *n*-by-*p*, `D`

is *n*-by-*S*,
where *S* is the sum of the number of levels in each
of the columns of `group`

. The number of levels *s* in
any column of `group`

is the maximum positive integer
in the column or the number of categorical levels. Levels are considered
distinct if they appear in different columns of `group`

,
even if they have the same value. Columns of `D`

are,
from left to right, dummy variables created from the first column
of `group`

, followed by dummy variables created from
the second column of `group`

, etc.

`dummyvar`

treats `NaN`

values
or undefined categorical levels in `group`

as missing
data and returns `NaN`

values in `D`

.

Dummy variables are used in regression analysis and ANOVA to indicate values of categorical predictors.

If a column of 1s is introduced in the matrix `D`

,
the resulting matrix `X = [ones(size(D,1),1) D]`

will
be rank deficient. The matrix `D`

itself will be
rank deficient if `group`

has multiple columns. This
is because dummy variables produced from any column of `group`

always
sum to a column of 1s. Regression and ANOVA calculations often address
this issue by eliminating one dummy variable (implicitly setting the
coefficients for dropped columns to zero) from each group of dummy
variables produced by a column of `group`

.

Suppose you are studying the effects of two machines and three
operators on a process. Use `group`

to organize predictor
data on machine-operator combinations:

machine = [1 1 1 1 2 2 2 2]'; operator = [1 2 3 1 2 3 1 2]'; group = [machine operator] group = 1 1 1 2 1 3 1 1 2 2 2 3 2 1 2 2

Use `dummyvar`

to create dummy variables
for a regression or ANOVA calculation:

D = dummyvar(group) D = 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0

The first two columns of `D`

represent observations
of machine `1`

and machine `2`

,
respectively; the remaining columns represent observations of the
three operators.

Was this topic helpful?