Documentation Center

  • Trial Software
  • Product Updates

Convert Table Variables Containing Strings to Categorical

This example shows how to convert a variable in a table from a cell array of strings to a categorical array.

Load Sample Data and Create a Table

Load sample data gathered from 100 patients.

load patients

whos
  Name                            Size            Bytes  Class      Attributes

  Age                           100x1               800  double               
  Diastolic                     100x1               800  double               
  Gender                        100x1             12212  cell                 
  Height                        100x1               800  double               
  LastName                      100x1             12416  cell                 
  Location                      100x1             15008  cell                 
  SelfAssessedHealthStatus      100x1             12340  cell                 
  Smoker                        100x1               100  logical              
  Systolic                      100x1               800  double               
  Weight                        100x1               800  double               

Store the patient data from Age, Gender, Height, Weight, SelfAssessedHealthStatus, and Location in a table. Use the unique identifiers in the variable LastName as row names.

T = table(Age,Gender,Height,Weight,...
    SelfAssessedHealthStatus,Location,...
    'RowNames',LastName);

Convert Table Variables from Cell Arrays of Strings to Categorical Arrays

The cell arrays of strings, Gender and Location, contain small a discrete set of unique values.

Convert Gender and Location to categorical arrays.

T.Gender = categorical(T.Gender);
T.Location = categorical(T.Location);

The variable, SelfAssessedHealthStatus, contains four unique values: Excellent, Fair, Good, and Poor.

Convert SelfAssessedHealthStatus to an ordinal categorical array, such that the categories have the mathematical ordering Poor < Fair < Good < Excellent.

T.SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus,...
    {'Poor','Fair','Good','Excellent'},'Ordinal',true);

Print a Summary

View the data type, description, units, and other descriptive statistics for each variable by using summary to summarize the table.

format compact

summary(T)
Variables:
    Age: 100x1 double
        Values:
            min       25   
            median    39   
            max       50   
    Gender: 100x1 categorical
        Values:
            Female    53      
            Male      47      
    Height: 100x1 double
        Values:
            min       60      
            median    67      
            max       72      
    Weight: 100x1 double
        Values:
            min         111   
            median    142.5   
            max         202   
    SelfAssessedHealthStatus: 100x1 ordinal categorical
        Values:
            Poor         11                        
            Fair         15                        
            Good         40                        
            Excellent    34                        
    Location: 100x1 categorical
        Values:
            County General Hospital      39        
            St. Mary's Medical Center    24        
            VA Hospital                  37        

The table variables Gender, SelfAssessedHealthStatus, and Location are categorical arrays. The summary contains the counts of the number of elements in each category. For example, the summary indicates that 53 of the 100 patients are female and 47 are male.

Select Data Based on Categories

Create a subtable, T1, containing the age, height, and weight of all female patients who were observed at County General Hospital. You can easily create a logical vector based on the values in the categorical arrays Gender and Location.

rows = T.Location=='County General Hospital' & T.Gender=='Female';

rows is a 100-by-1 logical vector with logical true (1) for the table rows where the gender is female and the location is County General Hospital.

Define the subset of variables.

vars = {'Age','Height','Weight'};

Use parentheses to create the subtable, T1.

T1 = T(rows,vars)
T1 = 
                  Age    Height    Weight
                  ___    ______    ______
    Brown         49     64        119   
    Taylor        31     66        132   
    Anderson      45     68        128   
    Lee           44     66        146   
    Walker        28     65        123   
    Young         25     63        114   
    Campbell      37     65        135   
    Evans         39     62        121   
    Morris        43     64        135   
    Rivera        29     63        130   
    Richardson    30     67        141   
    Cox           28     66        111   
    Torres        45     70        137   
    Peterson      32     60        136   
    Ramirez       48     64        137   
    Bennett       35     64        131   
    Patterson     37     65        120   
    Hughes        49     63        123   
    Bryant        48     66        134   

A is a 19-by-3 table.

Since ordinal categorical arrays have a mathematical ordering for their categories, you can perform element-wise comparisons of strings with relational operations, such as greater than and less than.

Create a subtable, T2, of the gender, age, height, and weight of all patients who assessed their health status as poor or fair.

First, define the subset of rows to include in table T2.

rows = T.SelfAssessedHealthStatus<='Fair';

Then, define the subset of variables to include in table T2.

vars = {'Gender','Age','Height','Weight'};

Use parentheses to create the subtable T2.

T2 = T(rows,vars)
T2 = 
                 Gender    Age    Height    Weight
                 ______    ___    ______    ______
    Johnson      Male      43     69        163   
    Jones        Female    40     67        133   
    Thomas       Female    42     66        137   
    Jackson      Male      25     71        174   
    Garcia       Female    27     69        131   
    Rodriguez    Female    39     64        117   
    Lewis        Female    41     62        137   
    Lee          Female    44     66        146   
    Hall         Male      25     70        189   
    Hernandez    Male      36     68        166   
    Lopez        Female    40     66        137   
    Gonzalez     Female    35     66        118   
    Mitchell     Male      39     71        164   
    Campbell     Female    37     65        135   
    Parker       Male      30     68        182   
    Stewart      Male      49     68        170   
    Morris       Female    43     64        135   
    Watson       Female    40     64        127   
    Kelly        Female    41     65        127   
    Price        Male      31     72        178   
    Bennett      Female    35     64        131   
    Wood         Male      32     68        183   
    Patterson    Female    37     65        120   
    Foster       Female    30     70        124   
    Griffin      Male      49     70        186   
    Hayes        Male      48     66        177   

T2 is a 26-by-4 table.

Related Examples

More About

Was this topic helpful?