Control Categorical Histogram Display
This example shows how to use histogram to effectively view categorical data. You can use the name-value pairs 'NumDisplayBins', 'DisplayOrder', and 'ShowOthers' to change the display of a categorical histogram. These options help you to better organize the data and reduce noise in the plot.
Create Categorical Histogram
The sample file outages.csv contains data representing electric utility outages in the United States. The file contains six columns: Region, OutageTime, Loss, Customers, RestorationTime, and Cause.
Read the outages.csv file as a table. Use the 'Format' option to specify the kind of data each column contains: categorical ('%C'), floating-point numeric ('%f'), or datetime ('%D'). Index into the first few rows of data to see the variables.
data_formats = '%C%D%f%f%D%C'; C = readtable('outages.csv','Format',data_formats); first_few_rows = C(1:10,:)
first_few_rows = 10x6 table Region OutageTime Loss Customers RestorationTime Cause _________ ________________ ______ __________ ________________ _______________ SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm West 2003-06-18 02:49 0 0 2003-06-18 10:54 attack West 2004-06-20 14:39 231.29 NaN 2004-06-20 19:16 equipment fault West 2002-06-06 19:28 311.86 NaN 2002-06-07 00:51 equipment fault NorthEast 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 fire MidWest 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 equipment fault
Plot a categorical histogram of the Cause variable. Specify an output argument to return a handle to the histogram object.
h = histogram(C.Cause); xlabel('Cause of Outage') ylabel('Frequency') title('Most Common Power Outage Causes')
Change the normalization of the histogram to use the 'probability' normalization, which displays the relative frequency of each outage cause.
h.Normalization = 'probability'; ylabel('Relative Frequency')
Change Display Order
Use the 'DisplayOrder' option to sort the bins from largest to smallest.
h.DisplayOrder = 'descend';
Truncate Number of Bars Displayed
Use the 'NumDisplayBins' option to display only three bars in the plot. The displayed probabilities no longer add to 1 since the undisplayed data is still taken into account for normalization.
h.NumDisplayBins = 3;
Summarize Excluded Data
Use the 'ShowOthers' option to summarize all of the excluded bars, so that the displayed probabilities again add to 1.
h.ShowOthers = 'on';
Limit Normalization to Display Data
Prior to R2017a, the histogram and histcounts functions used only binned data to calculate normalizations. This behavior meant that if some of the data ended up outside the bins, it was ignored for the purposes of normalization. However, in MATLAB® R2017a, the behavior changed to always normalize using the total number of elements in the input data. The new behavior is more intuitive, but if you prefer the old behavior, then you need to take a few special steps to limit the normalization only to the binned data.
Instead of normalizing over all of the input data, you can limit the probability normalization to the data that is displayed in the histogram. Simply update the Data property of the histogram object to remove the other categories. The Categories property reflects the categories displayed in the histogram. Use setdiff to compare the two property values and remove any category from Data that is not in Categories. Then remove all of the resulting undefined categorical elements from the data, leaving only elements in the displayed categories.
h.ShowOthers = 'off'; cats_to_remove = setdiff(categories(h.Data),h.Categories); h.Data = removecats(h.Data,cats_to_remove); h.Data = rmmissing(h.Data);
The normalization is now based only on the three remaining categories, so the three bars add to 1.