Create Categorical Arrays
This example shows how to create a categorical array. categorical
is a data type for storing data with values from a finite set of discrete categories. These categories can have a natural order, but it is not required. A categorical array provides efficient storage and convenient manipulation of data, while also maintaining meaningful names for the values. You can use categorical arrays in a table to define groups of rows.
By default, categorical arrays contain categories that have no mathematical ordering. For example, the discrete set of pet categories ["dog","cat","bird"]
has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering ["bird","cat","dog"]
. Ordinal categorical arrays contain categories that have a meaningful mathematical ordering. For example, the discrete set of size categories ["small","medium","large"]
has the mathematical ordering small < medium < large
.
When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed. For example, if you specify the text [" cat","dog"]
as categories, then when you convert them to categories they become ["cat","dog"]
.
Create Categorical Array from String Array
You can use the categorical
function to create a categorical array from a numeric array, logical array, string array, cell array of character vectors, or an existing categorical array.
Create a 1-by-11 string array containing state names from New England.
state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"]
state = 1x11 string
"MA" "ME" "CT" "VT" "ME" "NH" "VT" "MA" "NH" "CT" "RI"
Convert the string array, state
, to a categorical array that has no mathematical order.
state = categorical(state)
state = 1x11 categorical
MA ME CT VT ME NH VT MA NH CT RI
List the discrete categories in the variable state
. There are only six unique states listed in state
, which means there are six categories. The categories are listed in alphabetical order.
categories(state)
ans = 6x1 cell
{'CT'}
{'MA'}
{'ME'}
{'NH'}
{'RI'}
{'VT'}
Add New and Missing Elements
Add elements to the original string array. One of the elements is the missing string, displayed as <missing>
. Just as NaN
can indicate missing values in a numeric array, <missing>
indicates missing values in a string array.
state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"]; state = [string(missing) state]; state(13) = "ME"
state = 1x13 string
<missing> "MA" "ME" "CT" "VT" "ME" "NH" "VT" "MA" "NH" "CT" "RI" "ME"
Convert the string array to a categorical
array. The missing string becomes an undefined category, displayed as <undefined>
. It indicates an element of the categorical array that does not belong to any category.
state = categorical(state)
state = 1x13 categorical
<undefined> MA ME CT VT ME NH VT MA NH CT RI ME
Create Ordinal Categorical Array from String Array
Create a 1-by-8 string array containing the sizes of eight objects.
AllSizes = ["medium","large","small","small","medium",... "large","medium","small"];
The string array, AllSizes
, has three distinct values: "large"
, "medium"
, and "small"
. When using a string array, there is no convenient way to indicate that small < medium < large
.
Convert the string array, AllSizes
, to an ordinal categorical array. Use valueset
to specify the values small
, medium
, and large
, which define the categories. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.
valueset = ["small","medium","large"]; sizeOrd = categorical(AllSizes,valueset,'Ordinal',true)
sizeOrd = 1x8 categorical
medium large small small medium large medium small
The order of the values in the categorical array, sizeOrd
, remains unchanged.
List the discrete categories in the categorical variable, sizeOrd
.
categories(sizeOrd)
ans = 3x1 cell
{'small' }
{'medium'}
{'large' }
The categories are listed in the specified order to match the mathematical ordering small < medium < large
.
Create Ordinal Categorical Array by Binning Numeric Data
Create a vector of 100 random numbers between zero and 50.
x = rand(100,1)*50;
Use the discretize
function to create a categorical array by binning the values of x
. Put all values between zero and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint, but does not include the right endpoint.
catnames = ["small","medium","large"]; binnedData = discretize(x,[0 15 35 50],'categorical',catnames);
binnedData
is a 100-by-1 ordinal categorical array with three categories, such that small < medium < large
.
Use the summary
function to print the number of elements in each category.
summary(binnedData)
small 30 medium 35 large 35
You can make various kinds of charts of the binned data. For example, make a pie chart of binnedData
.
pie(binnedData)
See Also
categorical
| categories
| summary
| discretize
Related Examples
- Convert Text in Table Variables to Categorical
- Access Data Using Categorical Arrays
- Compare Categorical Array Elements