MATLAB Examples

Summary Statistics Grouped by Category

This example shows how to compute summary statistics grouped by levels of a categorical variable. You can compute group summary statistics for a numeric array or a dataset array using grpstats.


Load sample data.

load hospital

The dataset array, hospital, has 7 variables (columns) and 100 observations (rows).

Compute summary statistics by category.

The variable Sex is a nominal array with two levels, Male and Female. Compute the minimum and maximum weights for each gender.

stats = grpstats(hospital,'Sex',{'min','max'},'DataVars','Weight')
stats = 

              Sex       GroupCount    min_Weight    max_Weight
    Female    Female    53            111           147       
    Male      Male      47            158           202       

The dataset array, stats, has observations corresponding to the levels of the variable Sex. The variable min_Weight contains the minimum weight for each group, and the variable max_Weight contains the maximum weight for each group.

Compute summary statistics by multiple categories.

The variable Smoker is a logical array with value 1 for smokers and value 0 for nonsmokers. Compute the minimum and maximum weights for each gender and smoking combination.

stats = grpstats(hospital,{'Sex','Smoker'},{'min','max'},...
stats = 

                Sex       Smoker    GroupCount    min_Weight    max_Weight
    Female_0    Female    false     40            111           147       
    Female_1    Female    true      13            115           146       
    Male_0      Male      false     26            158           194       
    Male_1      Male      true      21            164           202       

The dataset array, stats, has an observation row for each combination of levels of Sex and Smoker in the original data.