Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Preprocess and Explore Time-stamped Data Using timetable

This example shows how to analyze bicycle traffic patterns from sensor data using the timetable data container to organize and preprocess time-stamped data. The data come from sensors on Broadway Street in Cambridge, MA. The City of Cambridge provides public access to the full data set at the Cambridge Open Data site.

This example shows how to perform a variety of data cleaning, munging, and preprocessing tasks such as removing missing values and synchronizing time-stamped data with different timesteps. In addition, data exploration is highlighted including visualizations and grouped calculations using the timetable data container to:

  • Explore daily bicycle traffic

  • Compare bicycle traffic to local weather conditions

  • Analyze bicycle traffic on various days of the week and times of day

Import Bicycle Traffic Data into Timetable

Import a sample of the bicycle traffic data from a comma-separated text file. The readtable function returns the data in a table. Display the first eight rows using the head function.

bikeTbl = readtable(fullfile(matlabroot,'examples','matlab_featured','BicycleCounts.csv'));
head(bikeTbl)
ans = 8×5 table
         Timestamp             Day        Total    Westbound    Eastbound
    ___________________    ___________    _____    _________    _________

    2015-06-24 00:00:00    'Wednesday'     13       9             4      
    2015-06-24 01:00:00    'Wednesday'      3       3             0      
    2015-06-24 02:00:00    'Wednesday'      1       1             0      
    2015-06-24 03:00:00    'Wednesday'      1       1             0      
    2015-06-24 04:00:00    'Wednesday'      1       1             0      
    2015-06-24 05:00:00    'Wednesday'      7       3             4      
    2015-06-24 06:00:00    'Wednesday'     36       6            30      
    2015-06-24 07:00:00    'Wednesday'    141      13           128      

The data have timestamps, so it is convenient to use a timetable to store and analyze the data. A timetable is similar to a table, but includes timestamps that are associated with the rows of data. The timestamps, or row times, are represented by datetime or duration values. datetime and duration are the recommended data types for representing points in time or elapsed times, respectively.

Convert bikeTbl into a timetable using the table2timetable function. You must use a conversion function because readtable returns a table. table2timetable converts the first datetime or duration variable in the table into the row times of a timetable. The row times are metadata that label the rows. However, when you display a timetable, the row times and timetable variables are displayed in a similar fashion. Note that the table has five variables whereas the timetable has four.

bikeData = table2timetable(bikeTbl);
head(bikeData)
ans = 8×4 timetable
         Timestamp             Day        Total    Westbound    Eastbound
    ___________________    ___________    _____    _________    _________

    2015-06-24 00:00:00    'Wednesday'     13       9             4      
    2015-06-24 01:00:00    'Wednesday'      3       3             0      
    2015-06-24 02:00:00    'Wednesday'      1       1             0      
    2015-06-24 03:00:00    'Wednesday'      1       1             0      
    2015-06-24 04:00:00    'Wednesday'      1       1             0      
    2015-06-24 05:00:00    'Wednesday'      7       3             4      
    2015-06-24 06:00:00    'Wednesday'     36       6            30      
    2015-06-24 07:00:00    'Wednesday'    141      13           128      

whos bikeTbl bikeData
  Name             Size              Bytes  Class        Attributes

  bikeData      9387x4             1487449  timetable              
  bikeTbl       9387x5             1562775  table                  

Access Times and Data

Convert the Day variable to categorical. The categorical data type is designed for data that consists of a finite set of discrete values, such as the names of the days of the week. List the categories so they display in day order. Use dot subscripting to access variables by name.

bikeData.Day = categorical(bikeData.Day,{'Sunday','Monday','Tuesday',...
                       'Wednesday','Thursday','Friday','Saturday'});  

In a timetable, the times are treated separately from the data variables. Access the Properties of the timetable to show that the row times are the first dimension of the timetable, and the variables are the second dimension. The DimensionNames property shows the names of the two dimensions, while the VariableNames property shows the names of the variables along the second dimension.

bikeData.Properties
ans = struct with fields:
             Description: ''
                UserData: []
          DimensionNames: {'Timestamp'  'Variables'}
           VariableNames: {'Day'  'Total'  'Westbound'  'Eastbound'}
    VariableDescriptions: {}
           VariableUnits: {}
                RowTimes: [9387×1 datetime]

By default, table2timetable assigned Timestamp as the first dimension name when it converted the table to a timetable, since this was the variable name from the original table. You can change the names of the dimensions, and other timetable metadata, through the Properties.

Change the names of the dimensions to Time and Data.

bikeData.Properties.DimensionNames = {'Time' 'Data'};
bikeData.Properties
ans = struct with fields:
             Description: ''
                UserData: []
          DimensionNames: {'Time'  'Data'}
           VariableNames: {'Day'  'Total'  'Westbound'  'Eastbound'}
    VariableDescriptions: {}
           VariableUnits: {}
                RowTimes: [9387×1 datetime]

Display the first eight rows of the timetable.

head(bikeData)
ans = 8×4 timetable
            Time              Day       Total    Westbound    Eastbound
    ___________________    _________    _____    _________    _________

    2015-06-24 00:00:00    Wednesday     13       9             4      
    2015-06-24 01:00:00    Wednesday      3       3             0      
    2015-06-24 02:00:00    Wednesday      1       1             0      
    2015-06-24 03:00:00    Wednesday      1       1             0      
    2015-06-24 04:00:00    Wednesday      1       1             0      
    2015-06-24 05:00:00    Wednesday      7       3             4      
    2015-06-24 06:00:00    Wednesday     36       6            30      
    2015-06-24 07:00:00    Wednesday    141      13           128      

Determine the number of days that elapsed between the latest and earliest row times. The variables can be accessed by dot notation when referencing variables one at a time.

elapsedTime = max(bikeData.Time) - min(bikeData.Time)
elapsedTime = duration
   9383:30:00

elapsedTime.Format = 'd'
elapsedTime = duration
   390.98 days

To examine the typical bicycle counts on a given day, calculate the means for the total number of bikes, and the numbers travelling westbound and eastbound.

Return the numeric data as a matrix by indexing into the contents of bikeData using curly braces. Display the first eight rows. Use standard table subscripting to access multiple variables.

counts = bikeData{:,2:end};
counts(1:8,:)
ans = 

    13     9     4
     3     3     0
     1     1     0
     1     1     0
     1     1     0
     7     3     4
    36     6    30
   141    13   128

Since the mean is appropriate for only the numeric data, you can use the vartype function to select the numeric variables. vartype can be more convnient than manually indexing into a table or timetable to select variables. Calculate the means and omit NaN values.

counts = bikeData{:,vartype('numeric')};
mean(counts,'omitnan')
ans = 

   49.8860   24.2002   25.6857

Select Data by Date and Time of Day

To determine how many people bicycle during a holiday, examine the data on the 4th of July holiday. Index into the timetable by row times for July 4, 2015. When you index on row times, you must match times exactly. You can specify the time indices as datetime or duration values, or as character vectors that can be converted to dates and times. You can specify multiple times as an array.

Index into bikeData with specific dates and times to extract data for July 4, 2015. If you specify the date only, then the time is assumed to be midnight, or 00:00:00.

bikeData('2015-07-04',:)
ans = 1×4 timetable
            Time             Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-07-04 00:00:00    Saturday    8        7            1        

d = {'2015-07-04 08:00:00','2015-07-04 09:00:00'};
bikeData(d,:)
ans = 2×4 timetable
            Time             Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-07-04 08:00:00    Saturday    15       3            12       
    2015-07-04 09:00:00    Saturday    21       4            17       

It would be tedious to use this strategy to extract the entire day. You can also specify ranges of time without indexing on specific times. To create a time range subscript as a helper, use the timerange function.

Subscript into the timetable using a time range for the entire day of July 4, 2015. Specify the start time as midnight on July 4, and the end time as midnight on July 5. By default, timerange covers all times starting with the start time and up to, but not including, the end time. Plot the bicycle counts over the course of the day.

tr = timerange('2015-07-04','2015-07-05');
jul4 = bikeData(tr,'Total');
head(jul4)
ans = 8×1 timetable
            Time           Total
    ___________________    _____

    2015-07-04 00:00:00     8   
    2015-07-04 01:00:00    13   
    2015-07-04 02:00:00     4   
    2015-07-04 03:00:00     1   
    2015-07-04 04:00:00     0   
    2015-07-04 05:00:00     1   
    2015-07-04 06:00:00     8   
    2015-07-04 07:00:00    16   

bar(jul4.Time,jul4.Total)
ylabel('Bicycle Counts')
title('Bicycle Counts on July 4, 2015')

From the plot, there is more volume throughout the day, leveling off in the afternoon. Because many businesses are closed, the plot does not show typical traffic during commute hours. Spikes later in the evening can be attributed to celebrations with fireworks, which occur after dark. To examine these trends more closely, the data should be compared to data for typical days.

Compare the data for July 4 to data for the rest of the month of July.

jul = bikeData(timerange('2015-07-01','2015-08-01'),:);
plot(jul.Time,jul.Total)
hold on
plot(jul4.Time,jul4.Total)
ylabel('Total Count')
title('Bicycle Counts in July')
hold off
legend('Bicycle Count','July 4 Bicycle Count')

The plot shows variations that can be attributed to traffic differences between weekdays and weekends. The traffic patterns for July 4 and 5 are consistent with the pattern for weekend traffic. July 5 is a Monday but is often observed as a holiday. These trends can be examined more closely with further preprocessing and analysis.

Preprocess Times and Data Using timetable

Time-stamped data sets are often messy and may contain anomalies or errors. Timetables are well suited for resolving anomalies and errors.

A timetable does not have to have its row times in any particular order. It can contain rows that are not sorted by their row times. A timetable can also contain multiple rows with the same row time, though the rows can have different data values. Even when row times are sorted and unique, they can differ by time steps of different sizes. A timetable can even contain NaT or NaN values to indicate missing row times.

The timetable data type provides a number of different ways to resolve missing, duplicate, or nonuniform times. You can also resample or aggregate data to create a regular timetable. When a timetable is regular, it has row times that are sorted and unique, and have a uniform or evenly spaced time step between them.

  • To find missing row times, use ismissing.

  • To remove missing times and data, use rmmissing.

  • To sort a timetable by its row times, use sortrows.

  • To make a timetable with unique and sorted row times, use unique and retime.

  • To make a regular timetable, specify a uniformly spaced time vector and use retime.

Sort in Time Order

Determine if the timetable is sorted. A timetable is sorted if its row times are listed in ascending order.

issorted(bikeData)
ans = logical
   0

Sort the timetable. The sortrows function sorts the rows by their row times, from earliest to latest time. If there are rows with duplicate row times, then sortrows copies all the duplicates to the output.

bikeData = sortrows(bikeData);
issorted(bikeData)
ans = logical
   1

Identify and Remove Missing Times and Data

A timetable can have missing data indicators in its variables or its row times. For example, you can indicate missing numeric values as NaNs, and missing datetime values as NaTs. You can assign, find, remove, and fill missing values with the standardizeMissing, ismissing, rmmissing, and fillmissing functions, respectively.

Find and count the missing values in the timetable variables. In this example, missing values indicate circumstances when no data were collected.

missData = ismissing(bikeData);
sum(missData)
ans = 

     1     3     3     3

The output from ismissing is a logical matrix, the same size as the table, identifying missing data values as true. Display any rows which have missing data indicators.

idx = any(missData,2);
bikeData(idx,:)
ans = 3×4 timetable
            Time               Day        Total    Westbound    Eastbound
    ___________________    ___________    _____    _________    _________

    2015-08-03 00:00:00    Monday         NaN      NaN          NaN      
    2015-08-03 01:00:00    Monday         NaN      NaN          NaN      
    NaT                    <undefined>    NaN      NaN          NaN      

ismissing(bikeData) finds missing data in the timetable variables only, not the times. To find missing row times, call ismissing on the row times.

missTimes = ismissing(bikeData.Time);
bikeData(missTimes,:)
ans = 2×4 timetable
    Time        Day        Total    Westbound    Eastbound
    ____    ___________    _____    _________    _________

    NaT     <undefined>    NaN      NaN          NaN      
    NaT     Friday           6        3            3      

In this example, missing times or data values indicate measurement errors and can be excluded. Remove rows of the table containing missing data values and missing row times using rmmissing.

bikeData = rmmissing(bikeData);
sum(ismissing(bikeData))
ans = 

     0     0     0     0

sum(ismissing(bikeData.Time))
ans = 0

Remove Duplicate Times and Data

Determine if there are duplicate times and/or duplicate rows of data. You might want to exclude exact duplicates, as these can also be considered measurement errors. Identify duplicate times by finding where the difference between the sorted times is exactly zero.

idx = diff(bikeData.Time) == 0;
dup = bikeData.Time(idx)
dup = 3×1 datetime array
   2015-08-21 00:00:00
   2015-11-19 23:00:00
   2015-11-19 23:00:00

Three times are repeated and November 19, 2015 is repeated twice. Examine the data associated with the repeated times.

bikeData(dup(1),:)
ans = 2×4 timetable
            Time            Day      Total    Westbound    Eastbound
    ___________________    ______    _____    _________    _________

    2015-08-21 00:00:00    Friday    14       9            5        
    2015-08-21 00:00:00    Friday    11       7            4        

bikeData(dup(2),:)
ans = 3×4 timetable
            Time             Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-11-19 23:00:00    Thursday    17       15           2        
    2015-11-19 23:00:00    Thursday    17       15           2        
    2015-11-19 23:00:00    Thursday    17       15           2        

The first has duplicated times but non-duplicate data, whereas the others are entirely duplicated. Timetable rows are considered duplicates when they contain identical row times and identical data values across the rows. You can use unique to remove duplicate rows in the timetable. The unique function also sorts the rows by their row times.

bikeData = unique(bikeData); 

The rows with duplicate times but non-duplicate data require some interpretation. Examine the data around those times.

d = dup(1) + hours(-2:2);
bikeData(d,:)
ans = 5×4 timetable
            Time             Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-08-20 22:00:00    Thursday    40       30           10       
    2015-08-20 23:00:00    Thursday    25       18            7       
    2015-08-21 00:00:00    Friday      11        7            4       
    2015-08-21 00:00:00    Friday      14        9            5       
    2015-08-21 02:00:00    Friday       6        5            1       

In this case, the duplicate time may have been mistaken since the data and surrounding times are consistent. Though it appears to represent 01:00:00, it is uncertain what time this should have been. The data can be accumulated to account for the data at both time points.

sum(bikeData{dup(1),2:end})
ans = 

    25    16     9

This is only one case which can be done manually. However, for many rows, the retime function can perform this calculation. Accumulate the data for the unique times using the sum function to aggregate. The sum is appropriate for numeric data but not the categorical data in the timetable. Use vartype to identify the numeric variables.

vt = vartype('numeric');
t = unique(bikeData.Time);
numData = retime(bikeData(:,vt),t,'sum');
head(numData)
ans = 8×3 timetable
            Time           Total    Westbound    Eastbound
    ___________________    _____    _________    _________

    2015-06-24 00:00:00     13       9             4      
    2015-06-24 01:00:00      3       3             0      
    2015-06-24 02:00:00      1       1             0      
    2015-06-24 03:00:00      1       1             0      
    2015-06-24 04:00:00      1       1             0      
    2015-06-24 05:00:00      7       3             4      
    2015-06-24 06:00:00     36       6            30      
    2015-06-24 07:00:00    141      13           128      

You cannot sum the categorical data, but since one label represents the whole day, take the first value on each day. You can perform the retime operation again with the same time vector and concatenate the timetables together.

vc = vartype('categorical');
catData = retime(bikeData(:,vc),t,'firstvalue');
bikeData = [catData,numData];
bikeData(d,:)
ans = 4×4 timetable
            Time             Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-08-20 22:00:00    Thursday    40       30           10       
    2015-08-20 23:00:00    Thursday    25       18            7       
    2015-08-21 00:00:00    Friday      25       16            9       
    2015-08-21 02:00:00    Friday       6        5            1       

Examine Uniformity of Time Interval

The data appear to have a uniform time step of one hour. To determine if this is true for all the row times in the timetable, use the isregular function. isregular returns true for sorted, evenly-spaced times (monotonically increasing), with no duplicate or missing times (NaT or NaN).

isregular(bikeData)
ans = logical
   0

The output of 0, or false, indicates that the times in the timetable are not evenly spaced. Explore the time interval in more detail.

dt = diff(bikeData.Time);
[min(dt); max(dt)]
ans = 2×1 duration array
   00:30:00
   03:00:00

To put the timetable on a regular time interval, use retime or synchronize and specify the time interval of interest.

Determine Daily Bicycle Volume

Determine the counts per day using the retime function. Accumulate the count data for each day using the sum method. This is appropriate for numeric data but not the categorical data in the timetable. Use vartype to identify the variables by data type.

dayCountNum = retime(bikeData(:,vt),'daily','sum');
head(dayCountNum)
ans = 8×3 timetable
            Time           Total    Westbound    Eastbound
    ___________________    _____    _________    _________

    2015-06-24 00:00:00    2141     1141         1000     
    2015-06-25 00:00:00    2106     1123          983     
    2015-06-26 00:00:00    1748      970          778     
    2015-06-27 00:00:00     695      346          349     
    2015-06-28 00:00:00     153       83           70     
    2015-06-29 00:00:00    1841      978          863     
    2015-06-30 00:00:00    2170     1145         1025     
    2015-07-01 00:00:00     997      544          453     

As above, you can perform the retime operation again to represent the categorical data using an appropriate method and concatenate the timetables together.

dayCountCat = retime(bikeData(:,vc),'daily','firstvalue');
dayCount = [dayCountCat,dayCountNum];
head(dayCount)
ans = 8×4 timetable
            Time              Day       Total    Westbound    Eastbound
    ___________________    _________    _____    _________    _________

    2015-06-24 00:00:00    Wednesday    2141     1141         1000     
    2015-06-25 00:00:00    Thursday     2106     1123          983     
    2015-06-26 00:00:00    Friday       1748      970          778     
    2015-06-27 00:00:00    Saturday      695      346          349     
    2015-06-28 00:00:00    Sunday        153       83           70     
    2015-06-29 00:00:00    Monday       1841      978          863     
    2015-06-30 00:00:00    Tuesday      2170     1145         1025     
    2015-07-01 00:00:00    Wednesday     997      544          453     

Synchronize Bicycle Count and Weather Data

Examine the effect of weather on cycling behavior by comparing the bicycle count with weather data. Load the weather timetable which includes historical weather data from Boston, MA including storm events.

load(fullfile(matlabroot,'examples','matlab_featured','BostonWeatherData'));
head(weatherData)
ans = 8×3 timetable
        Time       TemperatureF    Humidity       Events   
    ___________    ____________    ________    ____________

    01-Jul-2015    72              78          Thunderstorm
    02-Jul-2015    72              60          None        
    03-Jul-2015    70              56          None        
    04-Jul-2015    67              75          None        
    05-Jul-2015    72              67          None        
    06-Jul-2015    74              69          None        
    07-Jul-2015    75              77          Rain        
    08-Jul-2015    79              68          Rain        

To summarize the times and variables in the timetable, use the summary function.

summary(weatherData)
RowTimes:

    Time: 383×1 datetime
        Values:

            Min          01-Jul-2015 
            Median       08-Jan-2016 
            Max          17-Jul-2016 
            TimeStep     24:00:00    

Variables:

    TemperatureF: 383×1 double

        Values:

            Min        2            
            Median    55            
            Max       85            

    Humidity: 383×1 double

        Values:

            Min       29        
            Median    64        
            Max       97        

    Events: 383×1 categorical

        Values:

            Fog               7     
            Hail              1     
            Rain            108     
            Rain-Snow         4     
            Snow             18     
            Thunderstorm     12     
            None            233     

Combine the bicycle data with the weather data to a common time vector using synchronize. You can resample or aggregate timetable data using any of the methods documented on the reference page for the synchronize function.

Synchronize the data from both timetables to a common time vector, constructed from the intersection of their individual daily time vectors.

data = synchronize(dayCount,weatherData,'intersection');
head(data)
ans = 8×7 timetable
            Time              Day       Total    Westbound    Eastbound    TemperatureF    Humidity       Events   
    ___________________    _________    _____    _________    _________    ____________    ________    ____________

    2015-07-01 00:00:00    Wednesday     997      544         453          72              78          Thunderstorm
    2015-07-02 00:00:00    Thursday     1943     1033         910          72              60          None        
    2015-07-03 00:00:00    Friday        870      454         416          70              56          None        
    2015-07-04 00:00:00    Saturday      669      328         341          67              75          None        
    2015-07-05 00:00:00    Sunday        702      407         295          72              67          None        
    2015-07-06 00:00:00    Monday       1900     1029         871          74              69          None        
    2015-07-07 00:00:00    Tuesday      2106     1140         966          75              77          Rain        
    2015-07-08 00:00:00    Wednesday    1855      984         871          79              68          Rain        

Compare bicycle traffic counts and outdoor temperature on separate y axes to examine the trends. Remove the weekends from the data for visualization.

idx = ~isweekend(data.Time);  
weekdayData = data(idx,{'TemperatureF','Total'});
figure
yyaxis left
plot(weekdayData.Time, weekdayData.Total) 
ylabel('Bicycle Count')
yyaxis right
plot(weekdayData.Time,weekdayData.TemperatureF) 
ylabel('Temperature (\circ F)')
title('Bicycle Counts and Temperature Over Time')
xlim([min(data.Time) max(data.Time)])

The plot shows that the traffic and weather data might follow similar trends. Zoom in on the plot.

xlim([datetime('2015-11-01'),datetime('2016-05-01')])

The trends are similar, indicating that fewer people cycle on colder days.

Analyze by Day of Week and Time of Day

Examine the data based on different intervals such as day of the week and time of day. Determine the total counts per day using varfun to perform grouped calculations on variables. Specify the sum function with a function handle and the grouping variable and preferred output type using name-value pairs.

byDay = varfun(@sum,bikeData,'GroupingVariables','Day',...
             'OutputFormat','table')
byDay = 7×5 table
       Day       GroupCount    sum_Total    sum_Westbound    sum_Eastbound
    _________    __________    _________    _____________    _____________

    Sunday       1344          25315        12471            12844        
    Monday       1343          79991        39219            40772        
    Tuesday      1320          81480        39695            41785        
    Wednesday    1344          86853        41726            45127        
    Thursday     1344          87516        42682            44834        
    Friday       1342          76643        36926            39717        
    Saturday     1343          30292        14343            15949        

figure
bar(byDay{:,{'sum_Westbound','sum_Eastbound'}})
legend({'Westbound','Eastbound'},'Location','eastoutside')
xticklabels({'Sun','Mon','Tue','Wed','Thu','Fri','Sat'})
title('Bicycle Count by Day of Week')

The bar plot indicates that traffic is heavier on weekdays. Also, there is a difference in the Eastbound and Westbound directions. This might indicate that people tend to take different routes when entering and leaving the city. Another possibility is that some people enter on one day and return on another day.

Determine the hour of day and use varfun for calculations by group.

bikeData.HrOfDay = hour(bikeData.Time);
byHr = varfun(@mean,bikeData(:,{'Westbound','Eastbound','HrOfDay'}),...
    'GroupingVariables','HrOfDay','OutputFormat','table');
head(byHr)
ans = 8×4 table
    HrOfDay    GroupCount    mean_Westbound    mean_Eastbound
    _______    __________    ______________    ______________

    0          389            5.4396            1.7686       
    1          389            2.7712           0.87147       
    2          391            1.8696           0.58312       
    3          391            0.7468             0.289       
    4          391           0.52685            1.0026       
    5          391           0.70588            4.7494       
    6          391            3.1228            22.097       
    7          391            9.1176             63.54       

bar(byHr{:,{'mean_Westbound','mean_Eastbound'}})
legend('Westbound','Eastbound','Location','eastoutside')
xlabel('Hour of Day')
ylabel('Bicycle Count')
title('Mean Bicycle Count by Hour of Day')

There are traffic spikes at the typical commute hours, around 9:00 a.m. and 5:00 p.m. Also, the trends between the Eastbound and Westbound directions are different. In general, the Westbound direction is toward residential areas surrounding the Cambridge area and toward the Universities. The Eastbound direction is toward Boston.

The traffic is heavier later in the day in the Westbound direction compared to the Eastbound direction. This might indicate university schedules and traffic due to restaurants in the area. Examine the trend by day of week as well as hour of day.

byHrDay = varfun(@sum,bikeData,'GroupingVariables',{'HrOfDay','Day'},...
    'OutputFormat','table');
head(byHrDay)
ans = 8×6 table
    HrOfDay       Day       GroupCount    sum_Total    sum_Westbound    sum_Eastbound
    _______    _________    __________    _________    _____________    _____________

    0          Sunday       56            473          345              128          
    0          Monday       55            202          145               57          
    0          Tuesday      55            297          213               84          
    0          Wednesday    56            374          286               88          
    0          Thursday     56            436          324              112          
    0          Friday       55            442          348               94          
    0          Saturday     56            580          455              125          
    1          Sunday       56            333          259               74          

To arrange the timetable so that the days of the week are the variables, use the unstack function.

hrAndDayWeek = unstack(byHrDay(:,{'HrOfDay','Day','sum_Total'}),'sum_Total','Day'); 
head(hrAndDayWeek)
ans = 8×8 table
    HrOfDay    Sunday    Monday    Tuesday    Wednesday    Thursday    Friday    Saturday
    _______    ______    ______    _______    _________    ________    ______    ________

    0          473        202       297        374          436         442      580     
    1          333         81       147        168          173         183      332     
    2          198         77        68         93          128         141      254     
    3           86         41        43         44           50          61       80     
    4           51         81       117        101          108          80       60     
    5          105        353       407        419          381         340      128     
    6          275       1750      1867       2066         1927        1625      351     
    7          553       5355      5515       5818         5731        4733      704     

ribbon(hrAndDayWeek.HrOfDay,hrAndDayWeek{:,2:end})
ylim([0 24])
xlim([0 8])
xticks(1:7)
xticklabels({'Sun','Mon','Tue','Wed','Thu','Fri','Sat'})
ylabel('Hour')
title('Bicycle Count by Hour and Day of Week')

There are similar trends for the regular work days of Monday through Friday, with peaks at rush hour and traffic tapering off in the evening. Friday has less volume, though the overall trend is similar to the other work days. The trends for Saturday and Sunday are similar to each other, without rush hour peaks and with more volume later in the day. The late evening trends are also similar for Monday through Friday, with less volume on Friday.

Analyze Traffic During Rush Hour

To examine the overall time of day trends, split up the data by rush hour times. It is possible to use different times of day or units of time using the discretize function. For example, separate the data into groups for AM, AMRush, Day, PMRush, PM. Then use varfun to calculate the mean by group.

bikeData.HrLabel = discretize(bikeData.HrOfDay,[0,6,10,15,19,24],'categorical',...
    {'AM','RushAM','Day','RushPM','PM'});
byHrBin = varfun(@mean,bikeData(:,{'Total','HrLabel'}),'GroupingVariables','HrLabel',...
    'OutputFormat','table')
byHrBin = 5×3 table
    HrLabel    GroupCount    mean_Total
    _______    __________    __________

    AM         2342          3.5508    
    RushAM     1564          94.893    
    Day        1955          45.612    
    RushPM     1564          98.066    
    PM         1955          35.198    

bar(byHrBin.mean_Total)
cats = categories(byHrBin.HrLabel);
xticklabels(cats)
title('Mean Bicycle Count During Rush Hours')

In general, there is about twice as much traffic in this area during the evening and morning rush hours compared to other times of the day. There is very little traffic in this area in the early morning, but there is still significant traffic in the evening and late evening, comparable to the day outside of the morning and evening rush hours.

Was this topic helpful?