When working with categorical variables and their levels, you'll
encounter some typical challenges. This table summarizes the functions
you can use with categorical arrays to manipulate category levels.
For additional functions, type
methods nominal or
ordinal at the command line, or see the
|Add new category levels|
|Drop category levels|
|Combine category levels|
|Reorder category levels|
|Count the number of observations in each category|
|Change the label or name of category levels|
|Create an interaction factor|
|Find observations that are not in a defined category|
You can use categorical arrays in a variety of statistical analyses. For example, you might want to compute descriptive statistics for data grouped by the category levels, conduct statistical tests on differences between category means, or perform regression analysis using categorical predictors.
Statistics and Machine Learning Toolbox™ functions that accept a grouping variable as an input argument accept categorical arrays. This includes descriptive functions such as:
You can also use categorical arrays as input arguments to analysis functions and methods based on models, such as:
When you use a categorical array as a predictor in these functions,
the fitting function automatically recognizes the categorical predictor,
and constructs appropriate dummy indicator variables for analysis.
Alternatively, you can construct your own dummy indicator variables
The levels of categorical variables are often defined as text
strings, which can be costly to store and manipulate in a cell array
of strings or
char array. Categorical arrays separately
store category membership and category labels, greatly reducing the
amount of memory required to store the variable.
For example, load some sample data:
speciesis a cell array of strings requiring 19,300 bytes of memory.
a nominal array:
species = nominal(species);
There is a 95% reduction in memory required to store the variable.