dummyvar

Create dummy variables

Syntax

D = dummyvar(group)

Description

D = dummyvar(group) returns a matrix D containing zeros and ones, whose columns are dummy variables for the grouping variable group. Columns of group represent categorical predictor variables, with values indicating categorical levels. Rows of group represent observations across variables.

group can be a numeric vector or categorical column vector representing levels within a single variable, a cell array containing one or more grouping variables, or a numeric matrix or cell array of categorical column vectors representing levels within multiple variables. If group is a numeric vector or matrix, values in any column must be positive integers in the range from 1 to the number of levels for the corresponding variable. In this case, dummyvars treats each column as a separate numeric grouping variable. With multiple grouping variables, the sets of dummy variable columns are in the same order as the grouping variables in group.

The order of the dummy variable columns in D matches the order of the groups defined by group. When group is a categorical vector, the groups and their order match the output of the getlabels(group) method. When group is a numeric vector, dummyvar assumes that the groups and their order are 1:max(group). In this respect, dummyvars treats a numeric grouping variable differently than grp2idx.

If group is n-by-p, D is n-by-S, where S is the sum of the number of levels in each of the columns of group. The number of levels s in any column of group is the maximum positive integer in the column or the number of categorical levels. Levels are considered distinct if they appear in different columns of group, even if they have the same value. Columns of D are, from left to right, dummy variables created from the first column of group, followed by dummy variables created from the second column of group, etc.

dummyvar treats NaN values or undefined categorical levels in group as missing data and returns NaN values in D.

Dummy variables are used in regression analysis and ANOVA to indicate values of categorical predictors.

    Note:   If a column of 1s is introduced in the matrix D, the resulting matrix X = [ones(size(D,1),1) D] will be rank deficient. The matrix D itself will be rank deficient if group has multiple columns. This is because dummy variables produced from any column of group always sum to a column of 1s. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column of group.

Examples

Suppose you are studying the effects of two machines and three operators on a process. Use group to organize predictor data on machine-operator combinations:

machine = [1 1 1 1 2 2 2 2]';
operator = [1 2 3 1 2 3 1 2]';
group = [machine operator]
group =
     1     1
     1     2
     1     3
     1     1
     2     2
     2     3
     2     1
     2     2

Use dummyvar to create dummy variables for a regression or ANOVA calculation:

D = dummyvar(group)
D =
     1     0     1     0     0
     1     0     0     1     0
     1     0     0     0     1
     1     0     1     0     0
     0     1     0     1     0
     0     1     0     0     1
     0     1     1     0     0
     0     1     0     1     0

The first two columns of D represent observations of machine 1 and machine 2, respectively; the remaining columns represent observations of the three operators.

See Also

|

Was this topic helpful?