Documentation Center

  • Trial Software
  • Product Updates

Grouping Variables

What Are Grouping Variables?

Grouping variables are utility variables used to group, or categorize, observations. Grouping variables are useful for summarizing or visualizing data by group. A grouping variable can be any of these data types:

  • Numeric vector

  • Logical vector

  • String array (also called character arrays)

  • Cell array of strings

  • Categorical vector

A grouping variable must have the same number of observations (rows) as the table, dataset array, or numeric array you are grouping. Observations that have the same grouping variable value belong to the same group.

For example, the following variables comprise the same groups. Each grouping variable divides five observations into two groups. The first group contains the first and fourth observations. The other three observations are in the second group.

Data TypeGrouping Variable
Numeric vector[1 2 2 1 2]
Logical vector[0 1 1 0 1]
Cell array of strings{'Male','Female','Female','Male','Female'}
Categorical vectorMale Female Female Male Female

Grouping variables with string labels give each group a meaningful name. A categorical array is an efficient and flexible choice of grouping variable.

Group Definition

Typically, there are as many groups as unique values in the grouping variable. However, categorical arrays can have levels that are not represented in the data. The groups and the order of the groups depend on the data type of the grouping variable. Suppose G is a grouping variable.

  • If G is a numeric or logical vector, then the groups correspond to the distinct values in G, in the sorted order of the unique values.

  • If G is a string array or cell array of strings, then the groups correspond to the distinct strings in G, in the order of their first appearance.

  • If G is a categorical vector, then the groups correspond to the unique category levels in G, in the order returned by getlevels.

Some functions, such as grpstats, accept multiple grouping variables specified as a cell array of grouping variables, for example, {G1,G2,G3}. In this case, the groups are defined by the unique combinations of values in the grouping variables. The order is decided first by the order of the first grouping variable, then by the order of the second grouping variable, and so on.

Analysis Using Grouping Variables

This table lists common tasks you might want to perform using grouping variables.

Grouping TaskFunction Accepting Grouping Variable
Draw side-by-side boxplots for data in different groups.boxplot
Draw a scatter plot with markers colored by group.gscatter
Draw a scatter plot matrix with markers colored by group.gplotmatrix
Compute summary statistics by group.grpstats
Test for differences between group means.anovan
Create an index vector from a grouping variable.grp2idx

Missing Group Values

Grouping variables can have missing values provided you include a valid indicator.

Grouping Variable Data TypeMissing Value Indicator
Numeric vectorNaN
Logical vector(Cannot be missing)
String arrayRow of spaces
Cell array of strings''
Categorical vector<undefined>

See Also

| |

Related Examples

More About

Was this topic helpful?