Documentation Center

  • Trial Software
  • Product Updates

Access Data Using Categorical Arrays

Select Data By Category

Selecting data based on its values is often useful. This type of data selection can involve creating a logical vector based on values in one variable, and then using that logical vector to select a subset of values in other variables. You can create a logical vector for selecting data by finding values in a numeric array that fall within a certain range. Additionally, you can create the logical vector by finding specific discrete values. When using categorical arrays, you can easily:

  • Select elements from particular categories. For categorical arrays, use the logical operators == or ~= to select data that is in, or not in, a particular category. To select data in a particular group of categories, use the ismember function.

    For ordinal categorical arrays, use inequalities >, >=, <, or <= to find data in categories above or below a particular category.

  • Delete data that is in a particular category. Use logical operators to include or exclude data from particular categories.

  • Find elements that are not in a defined category. Categorical arrays indicate which elements do not belong to a defined category by <undefined>. Use the isundefined function to find observations without a defined value.

Common Ways to Access Data Using Categorical Arrays

This example shows how to index and search using categorical arrays. You can access data using categorical arrays stored within a table in a similar manner.

Load Sample Data

Load sample data gathered from 100 patients.

load patients

whos
  Name                            Size            Bytes  Class      Attributes

  Age                           100x1               800  double               
  Diastolic                     100x1               800  double               
  Gender                        100x1             12212  cell                 
  Height                        100x1               800  double               
  LastName                      100x1             12416  cell                 
  Location                      100x1             15008  cell                 
  SelfAssessedHealthStatus      100x1             12340  cell                 
  Smoker                        100x1               100  logical              
  Systolic                      100x1               800  double               
  Weight                        100x1               800  double               

Create Categorical Arrays from Cell Arrays of Strings

To compare strings in character arrays and cell arrays of strings, you must use strcmp, which can be cumbersome. To access and compare data more easily, convert Gender and Location to categorical arrays.

Gender = categorical(Gender);
Location = categorical(Location);

Search for Members of a Single Category

For categorical arrays, you can use the logical operators == and ~= to find the data that is in, or not in, a particular category.

Determine if there are any patients observed at the location, 'Rampart General Hospital'.

any(Location=='Rampart General Hospital')
ans =

     0

There are no patients observed at Rampart General Hospital.

Search for Members of a Group of Categories

You can use ismember to find data in a particular group of categories. Create a logical vector for the patients observed at County General Hospital or VA Hospital.

VA_CountyGenIndex = ...
    ismember(Location,{'County General Hospital','VA Hospital'});

VA_CountyGenIndex is a 100-by-1 logical array containing logical true (1) for each element in the categorical array Location that is a member of the category County General Hospital or VA Hospital. The output, VA_CountyGenIndex contains 76 nonzero elements.

Use the logical vector, VA_CountyGenIndex to select the LastName of the patients observed at either County General Hospital or VA Hospital.

VA_CountyGenPatients = LastName(VA_CountyGenIndex);

VA_CountyGenPatients is a 76-by-1 cell array of strings.

Select Elements in a Particular Category to Plot

Use the summary function to print a summary containing the category names and the number of elements in each category.

summary(Location)
     County General Hospital        39 
     St. Mary's Medical Center      24 
     VA Hospital                    37 

Location is a 100-by-1 categorical array with three categories. County General Hospital occurs in 39 elements, St. Mary s Medical Center in 24 elements, and VA Hospital in 37 elements.

Use the summary function to print a summary of Gender.

summary(Gender)
     Female      53 
     Male        47 

Gender is a 100-by-1 categorical array with two categories. Female occurs in 53 elements and Male occurs in 47 elements.

Use logical operator == to access the age of only the female patients. Then plot a histogram of this data.

figure()
hist(Age(Gender=='Female'))
title('Age of Female Patients')

hist(Age(Gender=='Female')) plots the age data for the 53 female patients.

Delete Data from a Particular Category

You can use logical operators to include or exclude data from particular categories. Delete all patients observed at VA Hospital from the workspace variables, Location and Age.

Location = Location(Location~='VA Hospital');
Age = Age(Location~='VA Hospital');

Now, Location is a 63-by-1 categorical array, and Age is a 63-by-1 numeric array.

List the categories of Location, as well as the number of elements in each category.

summary(Location)
     County General Hospital        39 
     St. Mary's Medical Center      24 
     VA Hospital                     0 

The patients observed at VA Hospital are deleted from Location, but VA Hostpital is still a category.

Use the removecats function to remove VA Hospital from the categories of Location.

Location = removecats(Location,'VA Hospital');

Verify that the category, VA Hospital, was removed.

categories(Location)
ans = 

    'County General Hospital'
    'St. Mary's Medical Center'

Location is a 63-by-1 categorical array that has two categories.

Check for Undefined Data

Remove the category County General Hospital from Location.

Location = removecats(Location,'County General Hospital');

Display the first eight elements of the categorical array, Location.

Location(1:8)
ans = 

     <undefined> 
     St. Mary's Medical Center 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     <undefined> 
     <undefined> 
     St. Mary's Medical Center 

After removing the category, County General Hospital, elements that previously belonged to that category no longer belong to any category defined for Location. Categorical arrays denote these elements as undefined.

Use the function isundefined to find observations that do not belong to any category.

undefinedIndex = isundefined(Location);

undefinedIndex is a 63-by-1 categorical array containing logical true (1) for all undefined elements in Location.

See Also

| | | | | |

Related Examples

More About

Was this topic helpful?