Create dummy variables
D = dummyvar(group)
D = dummyvar(group)
returns
a matrix D
containing zeros and ones, whose columns
are dummy variables for the grouping variable group
.
Columns of group
represent categorical predictor
variables, with values indicating categorical levels. Rows of group
represent
observations across variables.
group
can be a numeric vector or categorical
column vector representing levels within a single variable, a cell
array containing one or more grouping variables, or a numeric matrix
or cell array of categorical column vectors representing levels within
multiple variables. If group
is a numeric vector
or matrix, values in any column must be positive integers in the range
from 1
to the number of levels for the corresponding
variable. In this case, dummyvars
treats each column
as a separate numeric grouping variable. With multiple grouping variables,
the sets of dummy variable columns are in the same order as the grouping
variables in group
.
The order of the dummy variable columns in D
matches
the order of the groups defined by group
. When group
is
a categorical vector, the groups and their order match the output
of the getlabels(group)
method. When group
is
a numeric vector, dummyvar
assumes that the groups
and their order are 1:max(group)
. In this respect, dummyvars
treats
a numeric grouping variable differently than grp2idx
.
If group
is n-by-p, D
is n-by-S,
where S is the sum of the number of levels in each
of the columns of group
. The number of levels s in
any column of group
is the maximum positive integer
in the column or the number of categorical levels. Levels are considered
distinct if they appear in different columns of group
,
even if they have the same value. Columns of D
are,
from left to right, dummy variables created from the first column
of group
, followed by dummy variables created from
the second column of group
, etc.
dummyvar
treats NaN
values
or undefined categorical levels in group
as missing
data and returns NaN
values in D
.
Dummy variables are used in regression analysis and ANOVA to indicate values of categorical predictors.
Note:
If a column of 1s is introduced in the matrix |
Suppose you are studying the effects of two machines and three
operators on a process. Use group
to organize predictor
data on machine-operator combinations:
machine = [1 1 1 1 2 2 2 2]'; operator = [1 2 3 1 2 3 1 2]'; group = [machine operator] group = 1 1 1 2 1 3 1 1 2 2 2 3 2 1 2 2
Use dummyvar
to create dummy variables
for a regression or ANOVA calculation:
D = dummyvar(group) D = 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0
The first two columns of D
represent observations
of machine 1
and machine 2
,
respectively; the remaining columns represent observations of the
three operators.