categorical

Array that contains values assigned to categories

Description

`categorical` is a data type that assigns values to a finite set of discrete categories, such as `High`, `Med`, and `Low`. These categories can have a mathematical ordering that you specify, such as `High > Med > Low`, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to specify groups of rows in a table.

Creation

Syntax

``B = categorical(A)``
``B = categorical(A,valueset)``
``B = categorical(A,valueset,catnames)``
``B = categorical(A,___,Name,Value)``

````B = categorical(A)` creates a categorical array from the array `A`. The categories of `B` are the sorted unique values from `A`.```

````B = categorical(A,valueset)` creates one category for each value in `valueset`. The categories of `B` are in the same order as the values of `valueset`.You can use `valueset` to include categories for values not present in `A`. Conversely, if `A` contains any values not present in `valueset`, then the corresponding elements of `B` are undefined.```

````B = categorical(A,valueset,catnames)` names the categories in `B` by matching the category values in `valueset` with the names in `catnames`.```

````B = categorical(A,___,Name,Value)` creates a categorical array with additional options specified by one or more `Name,Value` pair arguments. You can include any of the input arguments in previous syntaxes.For example, to indicate that the categories have a mathematical ordering, specify `'Ordinal',true`.```

Input Arguments

Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.

`categorical` removes leading and trailing spaces from input values that are strings or character vectors.

If `A` contains missing values, then the corresponding element of `B` is undefined and displays as `<undefined>`. The `categorical` function converts the following values to undefined categorical values:

• `NaN` in numeric and duration arrays

• The missing string (`<missing>`) or the empty string (`""`) in string arrays

• The empty character vector (`''`) in cell arrays of character vectors

• `NaT` in datetime arrays

• Undefined values (`<undefined>`) in categorical arrays

`B` does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in `catnames`, and a missing value as the corresponding value in `valueset`.

`A` also can be an array of objects with the following class methods:

• `unique`

• `eq`

Categories, specified as a vector of unique values. The data type of `valueset` and the data type of `A` must be the same, except when `A` is a string array. In that case, `valueset` either can be a string array or a cell array of character vectors.

`categorical` removes leading and trailing spaces from elements of `valueset` that are strings or character vectors.

Category names, specified as a cell array of character vectors. If you do not specify the `catnames` input argument, then `categorical` uses the values in `valueset` as category names.

To merge multiple distinct values in `A` into a single category in `B`, include duplicate names corresponding to those values.

Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Ordinal',true` specifies that the categories have a mathematical ordering

Sort order indicator, specified as the comma-separated pair consisting of `'Ordinal'` and either `false` (`0`) or `true` (`1`).

 `false` (`0`) `categorical` creates a categorical array that is not ordinal, which is the default behavior. The categories of `B` have no mathematical ordering. Therefore, you can compare only the values in `B` for equality. `true` (`1`) `categorical` creates an ordinal categorical array. The categories of `B` have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in `B` using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the `min` and `max` functions on an ordinal categorical array.

Category protection indicator specified as the comma-separated pair consisting of `'Protected'` and either `false` (`0`) or `true` (`1`). The categories of ordinal categorical arrays are always protected. The default value is `true` when you specify `'Ordinal',true`. Otherwise, the value is `false`.

 `false` (`0`) When you assign new values to `B`, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays. `true` (`1`) When you assign new values to `B`, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to `B`, you must use the function `addcats`.

Examples

Create a categorical array that has weather station labels. Add it to a table of temperature readings. Then use the categories to select temperature readings by station.

First, create arrays containing temperature readings, dates, and station labels.

```Temps = [58; 72; 56; 90; 76]; Dates = {'2017-04-17';'2017-04-18';'2017-04-30';'2017-05-01';'2017-04-27'}; Stations = {'S1';'S2';'S1';'S3';'S2'};```

Convert `Stations` to a categorical array.

`Stations = categorical(Stations)`
```Stations = 5x1 categorical array S1 S2 S1 S3 S2 ```

Display the categories. The three stations labels are categories.

`categories(Stations)`
```ans = 3x1 cell array {'S1'} {'S2'} {'S3'} ```

Create a table that contains the temperatures, dates, and station labels.

`T = table(Temps,Dates,Stations)`
```T=5x3 table Temps Dates Stations _____ ____________ ________ 58 '2017-04-17' S1 72 '2017-04-18' S2 56 '2017-04-30' S1 90 '2017-05-01' S3 76 '2017-04-27' S2 ```

Display the readings taken from station `S2`. You can use the `==` operator to find the values of `Station` that equal `S2`. Then use logical indexing to select the table rows that have data from station `S2`.

```TF = (T.Stations == 'S2'); T(TF,:)```
```ans=2x3 table Temps Dates Stations _____ ____________ ________ 72 '2017-04-18' S2 76 '2017-04-27' S2 ```

Convert the cell array of character vectors `A` to a categorical array. Specify a list of categories that includes values that are not present in `A`.

Create a cell array of character vectors.

`A = {'republican' 'democrat'; 'democrat' 'democrat'; 'democrat' 'republican'};`

Convert `A` to a categorical array. Add a category for `independent`.

```valueset = {'democrat' 'republican' 'independent'}; B = categorical(A,valueset)```
```B = 3x2 categorical array republican democrat democrat democrat democrat republican ```

Display the categories of `B`.

`categories(B)`
```ans = 3x1 cell array {'democrat' } {'republican' } {'independent'} ```

Create a numeric array.

`A = [1 3 2; 2 1 3; 3 1 2]`
```A = 1 3 2 2 1 3 3 1 2 ```

Convert `A` to categorical array `B` and specify category names.

`B = categorical(A,[1 2 3],{'red' 'green' 'blue'})`
```B = 3x3 categorical array red blue green green red blue blue red green ```

Display the categories of `B`.

`categories(B)`
```ans = 3x1 cell array {'red' } {'green'} {'blue' } ```

`B` is not an ordinal categorical array. Therefore, you can compare the values in `B` only using the equality operators, `==` and `~=`.

Find the elements that belong to the category `'red'`. Access those elements using logical indexing.

```TF = (B == 'red'); B(TF)```
```ans = 3x1 categorical array red red red ```

Create a 5-by-2 numeric array.

`A = [3 2;3 3;3 2;2 1;3 2]`
```A = 3 2 3 3 3 2 2 1 3 2 ```

Convert `A` to an ordinal categorical array where `1`, `2`, and `3` represent categories `child`, `adult`, and `senior` respectively.

```valueset = [1:3]; catnames = {'child' 'adult' 'senior'}; B = categorical(A,valueset,catnames,'Ordinal',true)```
```B = 5x2 categorical array senior adult senior senior senior adult adult child senior adult ```

Since `B` is ordinal, the categories of `B` have a mathematical ordering, `child < adult < senior`.

Starting in R2017a, you can create string arrays using double quotes. Also, a string array can have missing values, displayed as `<missing>`, without quotation marks.

`str = ["plane","jet","plane","helicopter",missing,"jet"]`
```str = 1x6 string array "plane" "jet" "plane" "helicopter" <missing> "jet" ```

Convert string array `str` to a categorical array. The `categorical` function converts missing strings to undefined categorical values, displayed as `<undefined>`.

`C = categorical(str)`
```C = 1x6 categorical array plane jet plane helicopter <undefined> jet ```

Use the `discretize` function (instead of `categorical`) to bin 100 random numbers into three categories.

```x = rand(100,1); y = discretize(x,[0 .25 .75 1],'categorical',{'small','medium','large'}); summary(y)```
``` small 22 medium 46 large 32 ```

Alternatives

You also can group numeric data into categories using `discretize`.