Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

categorical

Array that contains values assigned to categories

Description

categorical is a data type that assigns values to a finite set of discrete categories, such as High, Med, and Low. These categories can have a mathematical ordering that you specify, such as High > Med > Low, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to specify groups of rows in a table.

Creation

Syntax

B = categorical(A)
B = categorical(A,valueset)
B = categorical(A,valueset,catnames)
B = categorical(A,___,Name,Value)

Description

example

B = categorical(A) creates a categorical array from the array A. The categories of B are the sorted unique values from A.

example

B = categorical(A,valueset) creates one category for each value in valueset. The categories of B are in the same order as the values of valueset.

You can use valueset to include categories for values not present in A. Conversely, if A contains any values not present in valueset, then the corresponding elements of B are undefined.

example

B = categorical(A,valueset,catnames) names the categories in B by matching the category values in valueset with the names in catnames.

example

B = categorical(A,___,Name,Value) creates a categorical array with additional options specified by one or more Name,Value pair arguments. You can include any of the input arguments in previous syntaxes.

For example, to indicate that the categories have a mathematical ordering, specify 'Ordinal',true.

Input Arguments

expand all

Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.

categorical removes leading and trailing spaces from input values that are strings or character vectors.

If A contains missing values, then the corresponding element of B is undefined and displays as <undefined>. The categorical function converts the following values to undefined categorical values:

  • NaN in numeric and duration arrays

  • The missing string (<missing>) or the empty string ("") in string arrays

  • The empty character vector ('') in cell arrays of character vectors

  • NaT in datetime arrays

  • Undefined values (<undefined>) in categorical arrays

B does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in catnames, and a missing value as the corresponding value in valueset.

A also can be an array of objects with the following class methods:

  • unique

  • eq

Categories, specified as a vector of unique values. The data type of valueset and the data type of A must be the same, except when A is a string array. In that case, valueset either can be a string array or a cell array of character vectors.

categorical removes leading and trailing spaces from elements of valueset that are strings or character vectors.

Category names, specified as a cell array of character vectors. If you do not specify the catnames input argument, then categorical uses the values in valueset as category names.

To merge multiple distinct values in A into a single category in B, include duplicate names corresponding to those values.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Ordinal',true specifies that the categories have a mathematical ordering

expand all

Sort order indicator, specified as the comma-separated pair consisting of 'Ordinal' and either false (0) or true (1).

false (0)

categorical creates a categorical array that is not ordinal, which is the default behavior.

The categories of B have no mathematical ordering. Therefore, you can compare only the values in B for equality.

true (1)

categorical creates an ordinal categorical array.

The categories of B have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in B using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the min and max functions on an ordinal categorical array.

For more information, see Ordinal Categorical Arrays.

Category protection indicator specified as the comma-separated pair consisting of 'Protected' and either false (0) or true (1). The categories of ordinal categorical arrays are always protected. The default value is true when you specify 'Ordinal',true. Otherwise, the value is false.

false (0)

When you assign new values to B, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays.

true (1)

When you assign new values to B, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to B, you must use the function addcats.

Examples

expand all

Create a categorical array that has weather station labels. Add it to a table of temperature readings. Then use the categories to select temperature readings by station.

First, create arrays containing temperature readings, dates, and station labels.

Temps = [58; 72; 56; 90; 76];
Dates = {'2017-04-17';'2017-04-18';'2017-04-30';'2017-05-01';'2017-04-27'};
Stations = {'S1';'S2';'S1';'S3';'S2'};

Convert Stations to a categorical array.

Stations = categorical(Stations)
Stations = 5x1 categorical array
     S1 
     S2 
     S1 
     S3 
     S2 

Display the categories. The three stations labels are categories.

categories(Stations)
ans = 3x1 cell array
    {'S1'}
    {'S2'}
    {'S3'}

Create a table that contains the temperatures, dates, and station labels.

T = table(Temps,Dates,Stations)
T=5x3 table
    Temps       Dates        Stations
    _____    ____________    ________

    58       '2017-04-17'    S1      
    72       '2017-04-18'    S2      
    56       '2017-04-30'    S1      
    90       '2017-05-01'    S3      
    76       '2017-04-27'    S2      

Display the readings taken from station S2. You can use the == operator to find the values of Station that equal S2. Then use logical indexing to select the table rows that have data from station S2.

TF = (T.Stations == 'S2');
T(TF,:)
ans=2x3 table
    Temps       Dates        Stations
    _____    ____________    ________

    72       '2017-04-18'    S2      
    76       '2017-04-27'    S2      

Convert the cell array of character vectors A to a categorical array. Specify a list of categories that includes values that are not present in A.

Create a cell array of character vectors.

A = {'republican' 'democrat'; 'democrat' 'democrat'; 'democrat' 'republican'};

Convert A to a categorical array. Add a category for independent.

valueset = {'democrat' 'republican' 'independent'};
B = categorical(A,valueset)
B = 3x2 categorical array
     republican      democrat   
     democrat        democrat   
     democrat        republican 

Display the categories of B.

categories(B)
ans = 3x1 cell array
    {'democrat'   }
    {'republican' }
    {'independent'}

Create a numeric array.

A = [1 3 2; 2 1 3; 3 1 2]
A = 

     1     3     2
     2     1     3
     3     1     2

Convert A to categorical array B and specify category names.

B = categorical(A,[1 2 3],{'red' 'green' 'blue'})
B = 3x3 categorical array
     red        blue      green 
     green      red       blue  
     blue       red       green 

Display the categories of B.

categories(B)
ans = 3x1 cell array
    {'red'  }
    {'green'}
    {'blue' }

B is not an ordinal categorical array. Therefore, you can compare the values in B only using the equality operators, == and ~=.

Find the elements that belong to the category 'red'. Access those elements using logical indexing.

TF = (B == 'red');
B(TF)
ans = 3x1 categorical array
     red 
     red 
     red 

Create a 5-by-2 numeric array.

A = [3 2;3 3;3 2;2 1;3 2]
A = 

     3     2
     3     3
     3     2
     2     1
     3     2

Convert A to an ordinal categorical array where 1, 2, and 3 represent categories child, adult, and senior respectively.

valueset = [1:3];
catnames = {'child' 'adult' 'senior'};

B = categorical(A,valueset,catnames,'Ordinal',true)
B = 5x2 categorical array
     senior      adult  
     senior      senior 
     senior      adult  
     adult       child  
     senior      adult  

Since B is ordinal, the categories of B have a mathematical ordering, child < adult < senior.

Starting in R2017a, you can create string arrays using double quotes. Also, a string array can have missing values, displayed as <missing>, without quotation marks.

str = ["plane","jet","plane","helicopter",missing,"jet"]
str = 1x6 string array
    "plane"    "jet"    "plane"    "helicopter"    <missing>    "jet"

Convert string array str to a categorical array. The categorical function converts missing strings to undefined categorical values, displayed as <undefined>.

C = categorical(str)
C = 1x6 categorical array
     plane      jet      plane      helicopter      <undefined>      jet 

Use the discretize function (instead of categorical) to bin 100 random numbers into three categories.

x = rand(100,1);
y = discretize(x,[0 .25 .75 1],'categorical',{'small','medium','large'});
summary(y)
     small       22 
     medium      46 
     large       32 

Tips

Alternatives

You also can group numeric data into categories using discretize.

Extended Capabilities

Introduced in R2013b

Was this topic helpful?