## Documentation Center |

On this page… |
---|

Natural Representation of Categorical Data |

`categorical` is a data type to store data
with values from a finite set of discrete categories. One common alternative
to using categorical arrays is to use character arrays or cell arrays
of strings. To compare strings in character arrays and cell arrays
of strings, you must use `strcmp` which can be
cumbersome. With categorical arrays, you can use the logical operator `eq` (`==`)
to compare strings in the same way that you compare numeric arrays.
The other common alternative to using categorical arrays is to store
categorical data using integers in numeric arrays. Using numeric arrays
loses all the useful descriptive information from the category names,
and also tends to suggest that the integer values have their usual
numeric meaning, which, for categorical data, they do not.

Categorical arrays are convenient and memory efficient containers
for nonnumeric data with values from a finite set of discrete categories.
They are especially useful when the categories have a meaningful mathematical
ordering, such as an array with entries from the discrete set of categories `{'small','medium','large'}` where `small
< medium < large`.

An ordering other than alphabetical order is not possible with character arrays or cell arrays of strings. Thus, inequality comparisons, such as greater and less than, are not possible. With categorical arrays, you can use relational operations to test for equality and perform element-wise comparisons of strings that have a meaningful mathematical ordering.

This example shows how to compare the memory required to store data as a cell array of strings versus a categorical array. Categorical arrays have categories that are defined as strings, which can be costly to store and manipulate in a cell array of strings or `char` array. Categorical arrays store only one copy of each category name, often reducing the amount of memory required to store the array.

Create a sample cell array of strings.

state = [repmat({'MA'},25,1);repmat({'NY'},25,1);... repmat({'CA'},50,1);... repmat({'MA'},25,1);repmat({'NY'},25,1)];

Display information about the variable `state`.

```
whos state
```

Name Size Bytes Class Attributes state 150x1 17400 cell

The variable `state` is a cell array of strings requiring 17,400 bytes of memory.

Convert `state` to a categorical array.

state = categorical(state);

Display the discrete categories in the variable `state`.

categories(state)

ans = 'CA' 'MA' 'NY'

`state` contains 150 elements, but only three distinct categories.

Display information about the variable `state`.

```
whos state
```

Name Size Bytes Class Attributes state 150x1 754 categorical

There is a significant reduction in the memory required to store the variable.

- Create Categorical Arrays
- Convert Table Variables Containing Strings to Categorical
- Compare Categorical Array Elements
- Access Data Using Categorical Arrays

Was this topic helpful?