# Documentation

### This is machine translation

Translated by
Mouse over text to see original. Click the button below to return to the English verison of the page.

## Dummy Indicator Variables

### What Are Dummy Variables?

When performing regression analysis, it is common to include both continuous and categorical (quantitative and qualitative) predictor variables. When including a categorical independent variable, it is important not to input the variable as a numeric array. Numeric arrays have both order and magnitude. A categorical variable might have order (for example, an ordinal variable), but it does not have magnitude. Using a numeric array implies a known "distance" between the categories.

The appropriate way to include categorical predictors is as dummy indicator variables. An indicator variable has values 0 and 1. A categorical variable with c categories can be represented by c – 1 indicator variables.

For example, suppose you have a categorical variable with levels `{Small,Medium,Large}`. You can represent this variable using two dummy variables, as shown in this figure.

In this example, X1 is a dummy variable that has value 1 for the `Medium` group, and 0 otherwise. X2 is a dummy variable that has value 1 for the `Large` group, and 0 otherwise. Together, these two variables represent the three categories. Observations in the `Small` group have 0s for both dummy variables.

The category represented by all 0s is the reference group. When you include the dummy variables in a regression model, the coefficients of the dummy variables are interpreted with respect to the reference group.

### Creating Dummy Variables

#### Automatic Creation of Dummy Variables

The regression fitting functions, `fitlm`, `fitglm`, and `fitnlm`, recognize categorical array inputs as categorical predictors. That is, if you input your categorical predictor as a `nominal` or `ordinal` array, the fitting function automatically creates the required dummy variables. The first level returned by `getlevels` is the reference group. To use a different reference group, use `reorderlevels` to change the level order.

If there are c unique levels in the categorical array, then the fitting function estimates c – 1 regression coefficients for the categorical predictor.

 Note:   The fitting functions use every level of the categorical array returned by `getlevels`, even if there are levels with no observations. To remove levels from the categorical array, use `droplevels`.

#### Manual Creation of Dummy Variables

If you prefer to create your own dummy variable design matrix, use `dummyvar`. This function accepts a numeric or categorical column vector, and returns a matrix of indicator variables. The dummy variable design matrix has a column for every group, and a row for every observation.

For example,

```gender = nominal({'Male';'Female';'Female';'Male';'Female'}); dv = dummyvar(gender)```
```dv = 0 1 1 0 1 0 0 1 1 0```
There are five rows corresponding to the number of rows in `gender`, and two columns for the unique groups, `Female` and `Male`. Column order corresponds to the order of the levels in `gender`. For nominal arrays, the default order is ascending alphabetical.

To use these dummy variables in a regression model, you must either delete a column (to create a reference group), or fit a regression model with no intercept term. For the gender example, only one dummy variable is needed to represent two genders. Notice what happens if you add an intercept term to the complete design matrix, `dv`.

```X = [ones(5,1) dv] ```
```X = 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0```
`rank(X)`
```ans = 2```
The design matrix with an intercept term is not of full rank, and is not invertible. Because of this linear dependence, use only c – 1 indicator variables to represent a categorical variable with c categories in a regression model with an intercept term.