Index and Search Using Categorical Arrays

    Note:   The nominal and ordinal array data types might be removed in a future release. To represent ordered and unordered discrete, nonnumeric data, use the MATLAB® categorical data type instead.

Index By Category

It is often useful to index and search data by its category, or group. If you store categories as string labels inside a cell array of strings or char array, it can be difficult to index and search the categories. When using categorical arrays, you can easily:

  • Index elements from particular categories. For both nominal and ordinal arrays, you can use the logical operators == and ~= to index the observations that are in, or not in, a particular category. For ordinal arrays, which have an encoded order, you can also use inequalities, >, >=, <, and <=, to find observations in categories above or below a particular category.

  • Search for members of a category. In addition to the logical operator ==, you can use ismember to find observations in a particular group.

  • Find elements that are not in a defined category. Categorical arrays indicate which elements do not belong to a defined category by <undefined>. You can use isundefined to find observations missing a category.

  • Delete observations that are in a particular category. You can use logical operators to include or exclude observations from particular categories. Even if you remove all observations from a category, the category level remains defined unless you remove it using droplevels.

Common Indexing and Searching Methods

This example shows several common indexing and searching methods.

Load the sample data.

load carsmall;

Convert the char array, Origin, to a nominal array. This variable contains the country of origin, or manufacture, for each sample car.

Origin = nominal(Origin);

Search for observations in a category. Determine if there are any cars in the sample that were manufactured in Canada.

any(Origin=='Canada')
ans =

     0

There are no sample cars manufactured in Canada.

List the countries that are levels of Origin.

getlevels(Origin)
ans = 

     France      Germany      Italy      Japan      Sweden      USA 

Index elements that are in a particular category. Plot a histogram of the acceleration measurements for cars made in the U.S.

figure();
hist(Acceleration(Origin=='USA'))
title('Acceleration of Cars Made in the USA')

Delete observations that are in a particular category. Delete all cars made in Sweden from Origin.

Origin = Origin(Origin~='Sweden');
any(ismember(Origin,'Sweden'))
ans =

     0

The cars made in Sweden are deleted from Origin, but Sweden is still a level of Origin.

getlevels(Origin)
ans = 

     France      Germany      Italy      Japan      Sweden      USA 

Remove Sweden from the levels of Origin.

Origin = droplevels(Origin,'Sweden');
getlevels(Origin)
ans = 

     France      Germany      Italy      Japan      USA 

Check for observations not in a defined category. Get the indices for the cars made in France.

ix = find(Origin=='France')
ix =

    11
    27
    39
    61

There are four cars from France. Remove France from the levels of Origin.

Origin = droplevels(Origin,'France');

This returns a warning indicating that you are dropping a category level that has elements in it. These observations are no longer in a defined category, indicated by undefined.

Origin(ix)
ans = 

     <undefined> 
     <undefined> 
     <undefined> 
     <undefined> 

You can use isundefined to search for observations with an undefined category.

find(isundefined(Origin))
ans =

    11
    27
    39
    61

These indices correspond to the observations that were in category France, before that category was dropped from Origin.

See Also

| |

Related Examples

More About

Was this topic helpful?