| Statistics Toolbox™ | ![]() |
D = dummyvar(group)
D = dummyvar(group) creates {0,1}-valued dummy variables for each of the unique values in group. Columns of group represent categorical predictor variables, with values indicating categorical levels. Rows of group represent observations across variables. Each column of D is a dummy variable for one categorical level of one of the variables in group.
group can be a numeric vector or categorical column vector, representing levels within a single variable, or a numeric matrix or cell array of categorical column vectors, representing levels within multiple variables. If group is a numeric vector or matrix, values in any column must be positive integers in the range from 1 to the number of levels for the corresponding variable.
If group is n-by-p, D is n-by-S, where S is the sum of the number of levels in each of the columns of group. The number of levels s in any column of group is the maximum positive integer in the column or the number of categorical levels. Levels are considered distinct if they appear in different columns of group, even if they have the same value. Columns of D are, from left to right, dummy variables created from the first column of group, followed by dummy variables created from the second column of group, etc.
dummyvar treats NaN values or undefined categorical levels in group as missing data and returns NaN values in D.
Dummy variables are used in regression analysis and ANOVA to indicate values of categorical predictors.
Note If a column of 1s is introduced in the matrix D, the resulting matrix X = [ones(size(D,1),1) D] will be rank deficient. The matrix D itself will be rank deficient if group has multiple columns. This is because dummy variables produced from any column of group always sum to a column of 1s. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column of group. |
Suppose you are studying the effects of two machines and three operators on a process. Use group to organize predictor data on machine-operator combinations:
machine = [1 1 1 1 2 2 2 2]';
operator = [1 2 3 1 2 3 1 2]';
group = [machine operator]
group =
1 1
1 2
1 3
1 1
2 2
2 3
2 1
2 2Use dummyvar to create dummy variables for a regression or ANOVA calculation:
D = dummyvar(group)
D =
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
1 0 1 0 0
0 1 0 1 0
0 1 0 0 1
0 1 1 0 0
0 1 0 1 0The first two columns of D represent observations of machine 1 and machine 2, respectively; the remaining columns represent observations of the three operators.
![]() | droplevels | dwtest | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |