Main Content

Access Data Using Categorical Arrays

Select Data By Category

Selecting data based on its values is often useful. This type of data selection can involve creating a logical vector based on values in one variable, and then using that logical vector to select a subset of values in other variables. You can create a logical vector for selecting data by finding values in a numeric array that fall within a certain range. Additionally, you can create the logical vector by finding specific discrete values. When using categorical arrays, you can easily:

  • Select elements from particular categories. For categorical arrays, use the logical operators == or ~= to select data that is in, or not in, a particular category. To select data in a particular group of categories, use the ismember function.

    For ordinal categorical arrays, use inequalities >, >=, <, or <= to find data in categories above or below a particular category.

  • Delete data that is in a particular category. Use logical operators to include or exclude data from particular categories.

  • Find elements that are not in a defined category. Categorical arrays indicate which elements do not belong to a defined category by <undefined>. Use the isundefined function to find observations without a defined value.

Common Ways to Access Data Using Categorical Arrays

This example shows how to index and search using categorical arrays. You can access data using categorical arrays stored within a table in a similar manner.

Load Sample Data

Load sample data gathered from 100 patients.

load patients
whos
  Name                            Size            Bytes  Class      Attributes

  Age                           100x1               800  double               
  Diastolic                     100x1               800  double               
  Gender                        100x1             11412  cell                 
  Height                        100x1               800  double               
  LastName                      100x1             11616  cell                 
  Location                      100x1             14208  cell                 
  SelfAssessedHealthStatus      100x1             11540  cell                 
  Smoker                        100x1               100  logical              
  Systolic                      100x1               800  double               
  Weight                        100x1               800  double               

Create Categorical Arrays from Cell Arrays of Character Vectors

The cell arrays Location and SelfAssessedHealthStatus contain data that belong in categories. Each cell array contains character vectors taken from a small set of unique values (indicating three locations and four health statuses respectively). To convert Location and SelfAssessedHealthStatus to categorical arrays, use the categorical function.

Location = categorical(Location);
SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus);

Search for Members of a Single Category

For categorical arrays, you can use the logical operators == and ~= to find the data that is in, or not in, a particular category.

Determine if there are any patients observed at the location, Rampart General Hospital.

any(Location == "Rampart General Hospital")
ans = logical
   0

There are no patients observed at Rampart General Hospital.

Search for Members of a Group of Categories

You can use ismember to find data in a particular group of categories. Create a logical vector for the patients observed at either County General Hospital or VA Hospital.

VA_CountyGenIndex = ...
    ismember(Location,{'County General Hospital','VA Hospital'});

VA_CountyGenIndex is a 100-by-1 logical array containing logical true (1) for each element in the categorical array Location that is a member of the category County General Hospital or VA Hospital. The output, VA_CountyGenIndex contains 76 nonzero elements.

Use the logical vector, VA_CountyGenIndex to select the LastName of the patients observed at either County General Hospital or VA Hospital.

VA_CountyGenPatients = LastName(VA_CountyGenIndex);

VA_CountyGenPatients is a 76-by-1 cell array of character vectors.

Select Elements in a Particular Category to Plot

Use the summary function to print a summary containing the category names and the number of elements in each category.

summary(Location)
     County General Hospital       39 
     St. Mary's Medical Center      24 
     VA Hospital                   37 

Location is a 100-by-1 categorical array with three categories. County General Hospital occurs in 39 elements, St. Mary's Medical Center in 24 elements, and VA Hospital in 37 elements.

Use the summary function to print a summary of SelfAssessedHealthStatus.

summary(SelfAssessedHealthStatus)
     Excellent      34 
     Fair           15 
     Good           40 
     Poor           11 

SelfAssessedHealthStatus is a 100-by-1 categorical array with four categories.

Use logical operator == to access the ages of patients who assess their own health status as Good. Then plot a histogram of this data.

figure()
histogram(Age(SelfAssessedHealthStatus == 'Good'))
title('Ages of Patients with Good Health Status')

Figure contains an axes object. The axes object with title Ages of Patients with Good Health Status contains an object of type histogram.

histogram(Age(SelfAssessedHealthStatus == 'Good')) plots the age data for the 40 patients who reported Good as their health status.

Delete Data from a Particular Category

You can use logical operators to include or exclude data from particular categories. Delete all patients observed at VA Hospital from the workspace variables, Age and Location.

Age = Age(Location ~= "VA Hospital");
Location = Location(Location ~= "VA Hospital");

Now, Age is a 63-by-1 numeric array, and Location is a 63-by-1 categorical array.

List the categories of Location, as well as the number of elements in each category.

summary(Location)
     County General Hospital       39 
     St. Mary's Medical Center      24 
     VA Hospital                    0 

The patients observed at VA Hospital are deleted from Location, but VA Hospital is still a category.

Use the removecats function to remove VA Hospital from the categories of Location.

Location = removecats(Location,"VA Hospital");

Verify that the category, VA Hospital, was removed.

categories(Location)
ans = 2x1 cell
    {'County General Hospital'  }
    {'St. Mary's Medical Center'}

Location is a 63-by-1 categorical array that has two categories.

Delete Element

You can delete elements by indexing. For example, you can remove the first element of Location by selecting the rest of the elements with Location(2:end). However, an easier way to delete elements is to use [].

Location(1) = [];
summary(Location)
     County General Hospital       38 
     St. Mary's Medical Center      24 

Location is a 62-by-1 categorical array that has two categories. Deleting the first element has no effect on other elements from the same category and does not delete the category itself.

Test for Undefined Elements

Remove the category County General Hospital from Location.

Location = removecats(Location,"County General Hospital");

Display the first eight elements of the categorical array, Location.

Location(1:8)
ans = 8x1 categorical
     St. Mary's Medical Center 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     <undefined> 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 

After removing the category, County General Hospital, elements that previously belonged to that category no longer belong to any category defined for Location. The categorical elements that do not belong to any category are undefined, and display <undefined> as their values.

Use the function isundefined to find elements of a categorical array that do not belong to any category.

undefinedIndex = isundefined(Location);

undefinedIndex is a 62-by-1 categorical array containing logical true (1) for all undefined elements in Location.

Set Undefined Elements

Use the summary function to print the number of undefined elements in Location.

summary(Location)
     St. Mary's Medical Center      24 
     <undefined>                   38 

The first element of Location belongs to the category, St. Mary's Medical Center. Set the first element to be an undefined value so that it no longer belongs to any category. You can create undefined elements in a categorical array by assigning '', "", '<undefined>', string(nan), or missing as elements. When you assign such values to elements of a categorical array, it converts them to undefined values.

Location(1) = "";
summary(Location)
     St. Mary's Medical Center      23 
     <undefined>                   39 

You can make selected elements undefined without removing a category or changing the categories of other elements. Set undefined elements to indicate elements with values that are unknown.

Preallocate Categorical Arrays with Undefined Elements

You can use undefined elements to preallocate the size of a categorical array for better performance. Create a categorical array that has elements with known locations only.

definedIndex = ~isundefined(Location);
newLocation = Location(definedIndex);
summary(newLocation)
     St. Mary's Medical Center      23 

Expand the size of newLocation so that it is a 200-by-1 categorical array. Set the last new element to be an undefined element. All of the other new elements are also assigned undefined values. The 23 original elements keep the values that they had.

newLocation(200) = "";
summary(newLocation)
     St. Mary's Medical Center       23 
     <undefined>                   177 

newLocation has room for values you plan to store in the array later.

See Also

| | | | | |

Related Examples

More About