Control Categorical Histogram Display

Open Live Script

This example shows how to use histogram to effectively view categorical data. You can use the name-value pairs 'NumDisplayBins', 'DisplayOrder', and 'ShowOthers' to change the display of a categorical histogram. These options help you to better organize the data and reduce noise in the plot.

Create Categorical Histogram

The sample file outages.csv contains data representing electric utility outages in the United States. The file contains six columns: Region, OutageTime, Loss, Customers, RestorationTime, and Cause.

Read the outages.csv file as a table. Use the 'Format' option to specify the kind of data each column contains: categorical ('%C'), floating-point numeric ('%f'), or datetime ('%D'). Index into the first few rows of data to see the variables.

data_formats = '%C%D%f%f%D%C';
C = readtable('outages.csv','Format',data_formats);
first_few_rows = C(1:10,:)

first_few_rows=10×6 table
     Region         OutageTime        Loss     Customers     RestorationTime          Cause     
    _________    ________________    ______    __________    ________________    _______________

    SouthWest    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    winter storm   
    SouthEast    2003-01-23 00:49    530.14    2.1204e+05                 NaT    winter storm   
    SouthEast    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    winter storm   
    West         2004-04-06 05:44    434.81    3.4037e+05    2004-04-06 06:10    equipment fault
    MidWest      2002-03-16 06:18    186.44    2.1275e+05    2002-03-18 23:23    severe storm   
    West         2003-06-18 02:49         0             0    2003-06-18 10:54    attack         
    West         2004-06-20 14:39    231.29           NaN    2004-06-20 19:16    equipment fault
    West         2002-06-06 19:28    311.86           NaN    2002-06-07 00:51    equipment fault
    NorthEast    2003-07-16 16:23    239.93         49434    2003-07-17 01:12    fire           
    MidWest      2004-09-27 11:09    286.72         66104    2004-09-27 16:37    equipment fault

Plot a categorical histogram of the Cause variable. Specify an output argument to return a handle to the histogram object.

h = histogram(C.Cause);
xlabel('Cause of Outage')
ylabel('Frequency')
title('Most Common Power Outage Causes')

Figure contains an axes object. The axes object with title Most Common Power Outage Causes, xlabel Cause of Outage, ylabel Frequency contains an object of type categoricalhistogram.

Change the normalization of the histogram to use the 'probability' normalization, which displays the relative frequency of each outage cause.

h.Normalization = 'probability';
ylabel('Relative Frequency')