Documentation

This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

categorical

Array that contains values assigned to categories

Description

`categorical` is a data type that assigns values to a finite set of discrete categories, such as `High`, `Med`, and `Low`. These categories can have a mathematical ordering that you specify, such as `High > Med > Low`, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to specify groups of rows in a table.

Creation

Syntax

``B = categorical(A)``
``B = categorical(A,valueset)``
``B = categorical(A,valueset,catnames)``
``B = categorical(A,___,Name,Value)``

Description

example

````B = categorical(A)` creates a categorical array from the array `A`. The categories of `B` are the sorted unique values from `A`.```

example

````B = categorical(A,valueset)` creates one category for each value in `valueset`. The categories of `B` are in the same order as the values of `valueset`.You can use `valueset` to include categories for values not present in `A`. Conversely, if `A` contains any values not present in `valueset`, then the corresponding elements of `B` are undefined.```

example

````B = categorical(A,valueset,catnames)` names the categories in `B` by matching the category values in `valueset` with the names in `catnames`.```

example

````B = categorical(A,___,Name,Value)` creates a categorical array with additional options specified by one or more `Name,Value` pair arguments. You can include any of the input arguments in previous syntaxes.For example, to indicate that the categories have a mathematical ordering, specify `'Ordinal',true`.```

Input Arguments

expand all

Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.

`categorical` removes leading and trailing spaces from input values that are strings or character vectors.

If `A` contains missing values, then the corresponding element of `B` is undefined and displays as `<undefined>`. The `categorical` function converts the following values to undefined categorical values:

• `NaN` in numeric and duration arrays

• The missing string (`<missing>`) or the empty string (`""`) in string arrays

• The empty character vector (`''`) in cell arrays of character vectors

• `NaT` in datetime arrays

• Undefined values (`<undefined>`) in categorical arrays

`B` does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in `catnames`, and a missing value as the corresponding value in `valueset`.

`A` also can be an array of objects with the following class methods:

• `unique`

• `eq`

Categories, specified as a vector of unique values. The data type of `valueset` and the data type of `A` must be the same, except when `A` is a string array. In that case, `valueset` either can be a string array or a cell array of character vectors.

`categorical` removes leading and trailing spaces from elements of `valueset` that are strings or character vectors.

Category names, specified as a cell array of character vectors. If you do not specify the `catnames` input argument, then `categorical` uses the values in `valueset` as category names.

To merge multiple distinct values in `A` into a single category in `B`, include duplicate names corresponding to those values.

Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Ordinal',true` specifies that the categories have a mathematical ordering

expand all

Sort order indicator, specified as the comma-separated pair consisting of `'Ordinal'` and either `false` (`0`) or `true` (`1`).

 `false` (`0`) `categorical` creates a categorical array that is not ordinal, which is the default behavior. The categories of `B` have no mathematical ordering. Therefore, you can compare only the values in `B` for equality. `true` (`1`) `categorical` creates an ordinal categorical array. The categories of `B` have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in `B` using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the `min` and `max` functions on an ordinal categorical array.

Category protection indicator specified as the comma-separated pair consisting of `'Protected'` and either `false` (`0`) or `true` (`1`). The categories of ordinal categorical arrays are always protected. The default value is `true` when you specify `'Ordinal',true`. Otherwise, the value is `false`.

 `false` (`0`) When you assign new values to `B`, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays. `true` (`1`) When you assign new values to `B`, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to `B`, you must use the function `addcats`.

Examples

expand all

Create a categorical array that has weather station labels. Add it to a table of temperature readings. Then use the categories to select temperature readings by station.

First, create arrays containing temperature readings, dates, and station labels.

```Temps = [58; 72; 56; 90; 76]; Dates = {'2017-04-17';'2017-04-18';'2017-04-30';'2017-05-01';'2017-04-27'}; Stations = {'S1';'S2';'S1';'S3';'S2'};```

Convert `Stations` to a categorical array.

`Stations = categorical(Stations)`
```Stations = 5x1 categorical array S1 S2 S1 S3 S2 ```

Display the categories. The three stations labels are categories.

`categories(Stations)`
```ans = 3x1 cell array {'S1'} {'S2'} {'S3'} ```

Create a table that contains the temperatures, dates, and station labels.

`T = table(Temps,Dates,Stations)`
```T=5x3 table Temps Dates Stations _____ ____________ ________ 58 '2017-04-17' S1 72 '2017-04-18' S2 56 '2017-04-30' S1 90 '2017-05-01' S3 76 '2017-04-27' S2 ```

Display the readings taken from station `S2`. You can use the `==` operator to find the values of `Station` that equal `S2`. Then use logical indexing to select the table rows that have data from station `S2`.

```TF = (T.Stations == 'S2'); T(TF,:)```
```ans=2x3 table Temps Dates Stations _____ ____________ ________ 72 '2017-04-18' S2 76 '2017-04-27' S2 ```

Convert the cell array of character vectors `A` to a categorical array. Specify a list of categories that includes values that are not present in `A`.

Create a cell array of character vectors.

`A = {'republican' 'democrat'; 'democrat' 'democrat'; 'democrat' 'republican'};`

Convert `A` to a categorical array. Add a category for `independent`.

```valueset = {'democrat' 'republican' 'independent'}; B = categorical(A,valueset)```
```B = 3x2 categorical array republican democrat democrat democrat democrat republican ```

Display the categories of `B`.

`categories(B)`
```ans = 3x1 cell array {'democrat' } {'republican' } {'independent'} ```

Create a numeric array.

`A = [1 3 2; 2 1 3; 3 1 2]`
```A = 1 3 2 2 1 3 3 1 2 ```

Convert `A` to categorical array `B` and specify category names.

`B = categorical(A,[1 2 3],{'red' 'green' 'blue'})`
```B = 3x3 categorical array red blue green green red blue blue red green ```

Display the categories of `B`.

`categories(B)`
```ans = 3x1 cell array {'red' } {'green'} {'blue' } ```

`B` is not an ordinal categorical array. Therefore, you can compare the values in `B` only using the equality operators, `==` and `~=`.

Find the elements that belong to the category `'red'`. Access those elements using logical indexing.

```TF = (B == 'red'); B(TF)```
```ans = 3x1 categorical array red red red ```

Create a 5-by-2 numeric array.

`A = [3 2;3 3;3 2;2 1;3 2]`
```A = 3 2 3 3 3 2 2 1 3 2 ```

Convert `A` to an ordinal categorical array where `1`, `2`, and `3` represent categories `child`, `adult`, and `senior` respectively.

```valueset = [1:3]; catnames = {'child' 'adult' 'senior'}; B = categorical(A,valueset,catnames,'Ordinal',true)```
```B = 5x2 categorical array senior adult senior senior senior adult adult child senior adult ```

Since `B` is ordinal, the categories of `B` have a mathematical ordering, `child < adult < senior`.

Starting in R2017a, you can create string arrays using double quotes. Also, a string array can have missing values, displayed as `<missing>`, without quotation marks.

`str = ["plane","jet","plane","helicopter",missing,"jet"]`
```str = 1x6 string array "plane" "jet" "plane" "helicopter" <missing> "jet" ```

Convert string array `str` to a categorical array. The `categorical` function converts missing strings to undefined categorical values, displayed as `<undefined>`.

`C = categorical(str)`
```C = 1x6 categorical array plane jet plane helicopter <undefined> jet ```

Use the `discretize` function (instead of `categorical`) to bin 100 random numbers into three categories.

```x = rand(100,1); y = discretize(x,[0 .25 .75 1],'categorical',{'small','medium','large'}); summary(y)```
``` small 22 medium 46 large 32 ```

Alternatives

You also can group numeric data into categories using `discretize`.