This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Preprocess and Explore Time-stamped Data Using timetable

This example shows how to analyze bicycle traffic patterns from sensor data using the timetable data container to organize and preprocess time-stamped data. The data come from sensors on Broadway Street in Cambridge, MA. The City of Cambridge provides public access to the full data set at the Cambridge Open Data site.

This example shows how to perform a variety of data cleaning, munging, and preprocessing tasks such as removing missing values and synchronizing time-stamped data with different timesteps. In addition, data exploration is highlighted including visualizations and grouped calculations using the timetable data container to:

  • Explore daily bicycle traffic

  • Compare bicycle traffic to local weather conditions

  • Analyze bicycle traffic on various days of the week and times of day

Import Bicycle Traffic Data into Timetable

Import a sample of the bicycle traffic data from a comma-separated text file. The readtable function returns the data in a table. Display the first eight rows using the head function.

bikeTbl = readtable('BicycleCounts.csv');
head(bikeTbl)
ans=8×5 table
         Timestamp             Day        Total    Westbound    Eastbound
    ___________________    ___________    _____    _________    _________

    2015-06-24 00:00:00    'Wednesday'      13         9             4   
    2015-06-24 01:00:00    'Wednesday'       3         3             0   
    2015-06-24 02:00:00    'Wednesday'       1         1             0   
    2015-06-24 03:00:00    'Wednesday'       1         1             0   
    2015-06-24 04:00:00    'Wednesday'       1         1             0   
    2015-06-24 05:00:00    'Wednesday'       7         3             4   
    2015-06-24 06:00:00    'Wednesday'      36         6            30   
    2015-06-24 07:00:00    'Wednesday'     141        13           128   

The data have timestamps, so it is convenient to use a timetable to store and analyze the data. A timetable is similar to a table, but includes timestamps that are associated with the rows of data. The timestamps, or row times, are represented by datetime or duration values. datetime and duration are the recommended data types for representing points in time or elapsed times, respectively.

Convert bikeTbl into a timetable using the table2timetable function. You must use a conversion function because readtable returns a table. table2timetable converts the first datetime or duration variable in the table into the row times of a timetable. The row times are metadata that label the rows. However, when you display a timetable, the row times and timetable variables are displayed in a similar fashion. Note that the table has five variables whereas the timetable has four.

bikeData = table2timetable(bikeTbl);
head(bikeData)
ans=8×4 timetable
         Timestamp             Day        Total    Westbound    Eastbound
    ___________________    ___________    _____    _________    _________

    2015-06-24 00:00:00    'Wednesday'      13         9             4   
    2015-06-24 01:00:00    'Wednesday'       3         3             0   
    2015-06-24 02:00:00    'Wednesday'       1         1             0   
    2015-06-24 03:00:00    'Wednesday'       1         1             0   
    2015-06-24 04:00:00    'Wednesday'       1         1             0   
    2015-06-24 05:00:00    'Wednesday'       7         3             4   
    2015-06-24 06:00:00    'Wednesday'      36         6            30   
    2015-06-24 07:00:00    'Wednesday'     141        13           128   

whos bikeTbl bikeData
  Name             Size              Bytes  Class        Attributes

  bikeData      9387x4             1487627  timetable              
  bikeTbl       9387x5             1562953  table                  

Access Times and Data

Convert the Day variable to categorical. The categorical data type is designed for data that consists of a finite set of discrete values, such as the names of the days of the week. List the categories so they display in day order. Use dot subscripting to access variables by name.

bikeData.Day = categorical(bikeData.Day,{'Sunday','Monday','Tuesday',...
                       'Wednesday','Thursday','Friday','Saturday'});  

In a timetable, the times are treated separately from the data variables. Access the Properties of the timetable to show that the row times are the first dimension of the timetable, and the variables are the second dimension. The DimensionNames property shows the names of the two dimensions, while the VariableNames property shows the names of the variables along the second dimension.

bikeData.Properties
ans = 
  TimetableProperties with properties:

             Description: ''
                UserData: []
          DimensionNames: {'Timestamp'  'Variables'}
           VariableNames: {'Day'  'Total'  'Westbound'  'Eastbound'}
    VariableDescriptions: {}
           VariableUnits: {}
      VariableContinuity: []
                RowTimes: [9387x1 datetime]
               StartTime: 2015-06-24 00:00:00
              SampleRate: NaN
                TimeStep: NaN
        CustomProperties: No custom properties are set.
      Use addprop and rmprop to modify CustomProperties.

By default, table2timetable assigned Timestamp as the first dimension name when it converted the table to a timetable, since this was the variable name from the original table. You can change the names of the dimensions, and other timetable metadata, through the Properties.

Change the names of the dimensions to Time and Data.

bikeData.Properties.DimensionNames = {'Time' 'Data'};
bikeData.Properties
ans = 
  TimetableProperties with properties:

             Description: ''
                UserData: []
          DimensionNames: {'Time'  'Data'}
           VariableNames: {'Day'  'Total'  'Westbound'  'Eastbound'}
    VariableDescriptions: {}
           VariableUnits: {}
      VariableContinuity: []
                RowTimes: [9387x1 datetime]
               StartTime: 2015-06-24 00:00:00
              SampleRate: NaN
                TimeStep: NaN
        CustomProperties: No custom properties are set.
      Use addprop and rmprop to modify CustomProperties.

Display the first eight rows of the timetable.

head(bikeData)
ans=8×4 timetable
           Time               Day       Total    Westbound    Eastbound
    ___________________    _________    _____    _________    _________

    2015-06-24 00:00:00    Wednesday      13         9             4   
    2015-06-24 01:00:00    Wednesday       3         3             0   
    2015-06-24 02:00:00    Wednesday       1         1             0   
    2015-06-24 03:00:00    Wednesday       1         1             0   
    2015-06-24 04:00:00    Wednesday       1         1             0   
    2015-06-24 05:00:00    Wednesday       7         3             4   
    2015-06-24 06:00:00    Wednesday      36         6            30   
    2015-06-24 07:00:00    Wednesday     141        13           128   

Determine the number of days that elapsed between the latest and earliest row times. The variables can be accessed by dot notation when referencing variables one at a time.

elapsedTime = max(bikeData.Time) - min(bikeData.Time)
elapsedTime = duration
   9383:30:00

elapsedTime.Format = 'd'
elapsedTime = duration
   390.98 days

To examine the typical bicycle counts on a given day, calculate the means for the total number of bikes, and the numbers travelling westbound and eastbound.

Return the numeric data as a matrix by indexing into the contents of bikeData using curly braces. Display the first eight rows. Use standard table subscripting to access multiple variables.

counts = bikeData{:,2:end};
counts(1:8,:)
ans = 8×3

    13     9     4
     3     3     0
     1     1     0
     1     1     0
     1     1     0
     7     3     4
    36     6    30
   141    13   128

Since the mean is appropriate for only the numeric data, you can use the vartype function to select the numeric variables. vartype can be more convenient than manually indexing into a table or timetable to select variables. Calculate the means and omit NaN values.

counts = bikeData{:,vartype('numeric')};
mean(counts,'omitnan')
ans = 1×3

   49.8860   24.2002   25.6857

Select Data by Date and Time of Day

To determine how many people bicycle during a holiday, examine the data on the 4th of July holiday. Index into the timetable by row times for July 4, 2015. When you index on row times, you must match times exactly. You can specify the time indices as datetime or duration values, or as character vectors that can be converted to dates and times. You can specify multiple times as an array.

Index into bikeData with specific dates and times to extract data for July 4, 2015. If you specify the date only, then the time is assumed to be midnight, or 00:00:00.

bikeData('2015-07-04',:)
ans=1×4 timetable
           Time              Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-07-04 00:00:00    Saturday      8          7            1    

d = {'2015-07-04 08:00:00','2015-07-04 09:00:00'};
bikeData(d,:)
ans=2×4 timetable
           Time              Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-07-04 08:00:00    Saturday     15          3           12    
    2015-07-04 09:00:00    Saturday     21          4           17    

It would be tedious to use this strategy to extract the entire day. You can also specify ranges of time without indexing on specific times. To create a time range subscript as a helper, use the timerange function.

Subscript into the timetable using a time range for the entire day of July 4, 2015. Specify the start time as midnight on July 4, and the end time as midnight on July 5. By default, timerange covers all times starting with the start time and up to, but not including, the end time. Plot the bicycle counts over the course of the day.

tr = timerange('2015-07-04','2015-07-05');
jul4 = bikeData(tr,'Total');
head(jul4)
ans=8×1 timetable
           Time            Total
    ___________________    _____

    2015-07-04 00:00:00      8  
    2015-07-04 01:00:00     13  
    2015-07-04 02:00:00      4  
    2015-07-04 03:00:00      1  
    2015-07-04 04:00:00      0  
    2015-07-04 05:00:00      1  
    2015-07-04 06:00:00      8  
    2015-07-04 07:00:00     16  

bar(jul4.Time,jul4.Total)
ylabel('Bicycle Counts')
title('Bicycle Counts on July 4, 2015')

From the plot, there is more volume throughout the day, leveling off in the afternoon. Because many businesses are closed, the plot does not show typical traffic during commute hours. Spikes later in the evening can be attributed to celebrations with fireworks, which occur after dark. To examine these trends more closely, the data should be compared to data for typical days.

Compare the data for July 4 to data for the rest of the month of July.

jul = bikeData(timerange('2015-07-01','2015-08-01'),:);
plot(jul.Time,jul.Total)
hold on
plot(jul4.Time,jul4.Total)
ylabel('Total Count')
title('Bicycle Counts in July')
hold off
legend('Bicycle Count','July 4 Bicycle Count')

The plot shows variations that can be attributed to traffic differences between weekdays and weekends. The traffic patterns for July 4 and 5 are consistent with the pattern for weekend traffic. July 5 is a Monday but is often observed as a holiday. These trends can be examined more closely with further preprocessing and analysis.

Preprocess Times and Data Using timetable

Time-stamped data sets are often messy and may contain anomalies or errors. Timetables are well suited for resolving anomalies and errors.

A timetable does not have to have its row times in any particular order. It can contain rows that are not sorted by their row times. A timetable can also contain multiple rows with the same row time, though the rows can have different data values. Even when row times are sorted and unique, they can differ by time steps of different sizes. A timetable can even contain NaT or NaN values to indicate missing row times.

The timetable data type provides a number of different ways to resolve missing, duplicate, or nonuniform times. You can also resample or aggregate data to create a regular timetable. When a timetable is regular, it has row times that are sorted and unique, and have a uniform or evenly spaced time step between them.

  • To find missing row times, use ismissing.

  • To remove missing times and data, use rmmissing.

  • To sort a timetable by its row times, use sortrows.

  • To make a timetable with unique and sorted row times, use unique and retime.

  • To make a regular timetable, specify a uniformly spaced time vector and use retime.

Sort in Time Order

Determine if the timetable is sorted. A timetable is sorted if its row times are listed in ascending order.

issorted(bikeData)
ans = logical
   0

Sort the timetable. The sortrows function sorts the rows by their row times, from earliest to latest time. If there are rows with duplicate row times, then sortrows copies all the duplicates to the output.

bikeData = sortrows(bikeData);
issorted(bikeData)
ans = logical
   1

Identify and Remove Missing Times and Data

A timetable can have missing data indicators in its variables or its row times. For example, you can indicate missing numeric values as NaNs, and missing datetime values as NaTs. You can assign, find, remove, and fill missing values with the standardizeMissing, ismissing, rmmissing, and fillmissing functions, respectively.

Find and count the missing values in the timetable variables. In this example, missing values indicate circumstances when no data were collected.

missData = ismissing(bikeData);
sum(missData)
ans = 1×4

     1     3     3     3

The output from ismissing is a logical matrix, the same size as the table, identifying missing data values as true. Display any rows which have missing data indicators.

idx = any(missData,2);
bikeData(idx,:)
ans=3×4 timetable
           Time                Day        Total    Westbound    Eastbound
    ___________________    ___________    _____    _________    _________

    2015-08-03 00:00:00    Monday          NaN        NaN          NaN   
    2015-08-03 01:00:00    Monday          NaN        NaN          NaN   
    NaT                    <undefined>     NaN        NaN          NaN   

ismissing(bikeData) finds missing data in the timetable variables only, not the times. To find missing row times, call ismissing on the row times.

missTimes = ismissing(bikeData.Time);
bikeData(missTimes,:)
ans=2×4 timetable
    Time        Day        Total    Westbound    Eastbound
    ____    ___________    _____    _________    _________

    NaT     <undefined>     NaN        NaN          NaN   
    NaT     Friday            6          3            3   

In this example, missing times or data values indicate measurement errors and can be excluded. Remove rows of the table containing missing data values and missing row times using rmmissing.

bikeData = rmmissing(bikeData);
sum(ismissing(bikeData))
ans = 1×4

     0     0     0     0

sum(ismissing(bikeData.Time))
ans = 0

Remove Duplicate Times and Data

Determine if there are duplicate times and/or duplicate rows of data. You might want to exclude exact duplicates, as these can also be considered measurement errors. Identify duplicate times by finding where the difference between the sorted times is exactly zero.

idx = diff(bikeData.Time) == 0;
dup = bikeData.Time(idx)
dup = 3x1 datetime array
   2015-08-21 00:00:00
   2015-11-19 23:00:00
   2015-11-19 23:00:00

Three times are repeated and November 19, 2015 is repeated twice. Examine the data associated with the repeated times.

bikeData(dup(1),:)
ans=2×4 timetable
           Time             Day      Total    Westbound    Eastbound
    ___________________    ______    _____    _________    _________

    2015-08-21 00:00:00    Friday     14          9            5    
    2015-08-21 00:00:00    Friday     11          7            4    

bikeData(dup(2),:)
ans=3×4 timetable
           Time              Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-11-19 23:00:00    Thursday     17         15            2    
    2015-11-19 23:00:00    Thursday     17         15            2    
    2015-11-19 23:00:00    Thursday     17         15            2    

The first has duplicated times but non-duplicate data, whereas the others are entirely duplicated. Timetable rows are considered duplicates when they contain identical row times and identical data values across the rows. You can use unique to remove duplicate rows in the timetable. The unique function also sorts the rows by their row times.

bikeData = unique(bikeData); 

The rows with duplicate times but non-duplicate data require some interpretation. Examine the data around those times.

d = dup(1) + hours(-2:2);
bikeData(d,:)
ans=5×4 timetable
           Time              Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-08-20 22:00:00    Thursday     40         30           10    
    2015-08-20 23:00:00    Thursday     25         18            7    
    2015-08-21 00:00:00    Friday       11          7            4    
    2015-08-21 00:00:00    Friday       14          9            5    
    2015-08-21 02:00:00    Friday        6          5            1    

In this case, the duplicate time may have been mistaken since the data and surrounding times are consistent. Though it appears to represent 01:00:00, it is uncertain what time this should have been. The data can be accumulated to account for the data at both time points.

sum(bikeData{dup(1),2:end})
ans = 1×3

    25    16     9

This is only one case which can be done manually. However, for many rows, the retime function can perform this calculation. Accumulate the data for the unique times using the sum function to aggregate. The sum is appropriate for numeric data but not the categorical data in the timetable. Use vartype to identify the numeric variables.

vt = vartype('numeric');
t = unique(bikeData.Time);
numData = retime(bikeData(:,vt),t,'sum');
head(numData)
ans=8×3 timetable
           Time            Total    Westbound    Eastbound
    ___________________    _____    _________    _________

    2015-06-24 00:00:00      13         9             4   
    2015-06-24 01:00:00       3         3             0   
    2015-06-24 02:00:00       1         1             0   
    2015-06-24 03:00:00       1         1             0   
    2015-06-24 04:00:00       1         1             0   
    2015-06-24 05:00:00       7         3             4   
    2015-06-24 06:00:00      36         6            30   
    2015-06-24 07:00:00     141        13           128   

You cannot sum the categorical data, but since one label represents the whole day, take the first value on each day. You can perform the retime operation again with the same time vector and concatenate the timetables together.

vc = vartype('categorical');
catData = retime(bikeData(:,vc),t,'firstvalue');
bikeData = [catData,numData];
bikeData(d,:)
ans=4×4 timetable
           Time              Day       Total    Westbound    Eastbound
    ___________________    ________    _____    _________    _________

    2015-08-20 22:00:00    Thursday     40         30           10    
    2015-08-20 23:00:00    Thursday     25         18            7    
    2015-08-21 00:00:00    Friday       25         16            9    
    2015-08-21 02:00:00    Friday        6          5            1    

Examine Uniformity of Time Interval

The data appear to have a uniform time step of one hour. To determine if this is true for all the row times in the timetable, use the isregular function. isregular returns true for sorted, evenly-spaced times (monotonically increasing), with no duplicate or missing times (NaT or NaN).

isregular(bikeData)
ans = logical
   0

The output of 0, or false, indicates that the times in the timetable are not evenly spaced. Explore the time interval in more detail.

dt = diff(bikeData.Time);
[min(dt); max(dt)]
ans = 2x1 duration array
   00:30:00
   03:00:00

To put the timetable on a regular time interval, use retime or synchronize and specify the time interval of interest.

Determine Daily Bicycle Volume

Determine the counts per day using the retime function. Accumulate the count data for each day using the sum method. This is appropriate for numeric data but not the categorical data in the timetable. Use vartype to identify the variables by data type.

dayCountNum = retime(bikeData(:,vt),'daily','sum');
head(dayCountNum)
ans=8×3 timetable
           Time            Total    Westbound    Eastbound
    ___________________    _____    _________    _________

    2015-06-24 00:00:00    2141       1141         1000   
    2015-06-25 00:00:00    2106       1123          983   
    2015-06-26 00:00:00    1748        970          778   
    2015-06-27 00:00:00     695        346          349   
    2015-06-28 00:00:00     153         83           70   
    2015-06-29 00:00:00    1841        978          863   
    2015-06-30 00:00:00    2170       1145         1025   
    2015-07-01 00:00:00     997        544          453   

As above, you can perform the retime operation again to represent the categorical data using an appropriate method and concatenate the timetables together.

dayCountCat = retime(bikeData(:,vc),'daily','firstvalue');
dayCount = [dayCountCat,dayCountNum];
head(dayCount)
ans=8×4 timetable
           Time               Day       Total    Westbound    Eastbound
    ___________________    _________    _____    _________    _________

    2015-06-24 00:00:00    Wednesday    2141       1141         1000   
    2015-06-25 00:00:00    Thursday     2106       1123          983   
    2015-06-26 00:00:00    Friday       1748        970          778   
    2015-06-27 00:00:00    Saturday      695        346          349   
    2015-06-28 00:00:00    Sunday        153         83           70   
    2015-06-29 00:00:00    Monday       1841        978          863   
    2015-06-30 00:00:00    Tuesday      2170       1145         1025   
    2015-07-01 00:00:00    Wednesday     997        544          453   

Synchronize Bicycle Count and Weather Data

Examine the effect of weather on cycling behavior by comparing the bicycle count with weather data. Load the weather timetable which includes historical weather data from Boston, MA including storm events.

load BostonWeatherData
head(weatherData)
ans=8×3 timetable
       Time        TemperatureF    Humidity       Events   
    ___________    ____________    ________    ____________

    01-Jul-2015         72            78       Thunderstorm
    02-Jul-2015         72            60       None        
    03-Jul-2015         70            56       None        
    04-Jul-2015         67            75       None        
    05-Jul-2015         72            67       None        
    06-Jul-2015         74            69       None        
    07-Jul-2015         75            77       Rain        
    08-Jul-2015         79            68       Rain        

To summarize the times and variables in the timetable, use the summary function.

summary(weatherData)
RowTimes:

    Time: 383x1 datetime
        Values:

            Min          01-Jul-2015 
            Median       08-Jan-2016 
            Max          17-Jul-2016 
            TimeStep     24:00:00    

Variables:

    TemperatureF: 383x1 double

        Values:

            Min              2      
            Median          55      
            Max             85      

    Humidity: 383x1 double

        Values:

            Min           29    
            Median        64    
            Max           97    

    Events: 383x1 categorical

        Values:

            Fog                 7   
            Hail                1   
            Rain              108   
            Rain-Snow           4   
            Snow               18   
            Thunderstorm       12   
            None              233   

Combine the bicycle data with the weather data to a common time vector using synchronize. You can resample or aggregate timetable data using any of the methods documented on the reference page for the synchronize function.

Synchronize the data from both timetables to a common time vector, constructed from the intersection of their individual daily time vectors.

data = synchronize(dayCount,weatherData,'intersection');
head(data)
ans=8×7 timetable
           Time               Day       Total    Westbound    Eastbound    TemperatureF    Humidity       Events   
    ___________________    _________    _____    _________    _________    ____________    ________    ____________

    2015-07-01 00:00:00    Wednesday     997        544          453            72            78       Thunderstorm
    2015-07-02 00:00:00    Thursday     1943       1033          910            72            60       None        
    2015-07-03 00:00:00    Friday        870        454          416            70            56       None        
    2015-07-04 00:00:00    Saturday      669        328          341            67            75       None        
    2015-07-05 00:00:00    Sunday        702        407          295            72            67       None        
    2015-07-06 00:00:00    Monday       1900       1029          871            74            69       None        
    2015-07-07 00:00:00    Tuesday      2106       1140          966            75            77       Rain        
    2015-07-08 00:00:00    Wednesday    1855        984          871            79            68       Rain        

Compare bicycle traffic counts and outdoor temperature on separate y axes to examine the trends. Remove the weekends from the data for visualization.

idx = ~isweekend(data.Time);  
weekdayData = data(idx,{'TemperatureF','Total'});
figure
yyaxis left
plot(weekdayData.Time, weekdayData.Total) 
ylabel('Bicycle Count')
yyaxis right
plot(weekdayData.Time,weekdayData.TemperatureF) 
ylabel('Temperature (\circ F)')
title('Bicycle Counts and Temperature Over Time')
xlim([min(data.Time) max(data.Time)])

The plot shows that the traffic and weather data might follow similar trends. Zoom in on the plot.

xlim([datetime('2015-11-01'),datetime('2016-05-01')])

The trends are similar, indicating that fewer people cycle on colder days.

Analyze by Day of Week and Time of Day

Examine the data based on different intervals such as day of the week and time of day. Determine the total counts per day using varfun to perform grouped calculations on variables. Specify the sum function with a function handle and the grouping variable and preferred output type using name-value pairs.

byDay = varfun(@sum,bikeData,'GroupingVariables','Day',...
             'OutputFormat','table')
byDay=7×5 table
       Day       GroupCount    sum_Total    sum_Westbound    sum_Eastbound
    _________    __________    _________    _____________    _____________

    Sunday          1344         25315          12471            12844    
    Monday          1343         79991          39219            40772    
    Tuesday         1320         81480          39695            41785    
    Wednesday       1344         86853          41726            45127    
    Thursday        1344         87516          42682            44834    
    Friday          1342         76643          36926            39717    
    Saturday        1343         30292          14343            15949    

figure
bar(byDay{:,{'sum_Westbound','sum_Eastbound'}})
legend({'Westbound','Eastbound'},'Location','eastoutside')
xticklabels({'Sun','Mon','Tue','Wed','Thu','Fri','Sat'})
title('Bicycle Count by Day of Week')

The bar plot indicates that traffic is heavier on weekdays. Also, there is a difference in the Eastbound and Westbound directions. This might indicate that people tend to take different routes when entering and leaving the city. Another possibility is that some people enter on one day and return on another day.

Determine the hour of day and use varfun for calculations by group.

bikeData.HrOfDay = hour(bikeData.Time);
byHr = varfun(@mean,bikeData(:,{'Westbound','Eastbound','HrOfDay'}),...
    'GroupingVariables','HrOfDay','OutputFormat','table');
head(byHr)
ans=8×4 table
    HrOfDay    GroupCount    mean_Westbound    mean_Eastbound
    _______    __________    ______________    ______________

       0          389            5.4396            1.7686    
       1          389            2.7712           0.87147    
       2          391            1.8696           0.58312    
       3          391            0.7468             0.289    
       4          391           0.52685            1.0026    
       5          391           0.70588            4.7494    
       6          391            3.1228            22.097    
       7          391            9.1176             63.54    

bar(byHr{:,{'mean_Westbound','mean_Eastbound'}})
legend('Westbound','Eastbound','Location','eastoutside')
xlabel('Hour of Day')
ylabel('Bicycle Count')
title('Mean Bicycle Count by Hour of Day')

There are traffic spikes at the typical commute hours, around 9:00 a.m. and 5:00 p.m. Also, the trends between the Eastbound and Westbound directions are different. In general, the Westbound direction is toward residential areas surrounding the Cambridge area and toward the Universities. The Eastbound direction is toward Boston.

The traffic is heavier later in the day in the Westbound direction compared to the Eastbound direction. This might indicate university schedules and traffic due to restaurants in the area. Examine the trend by day of week as well as hour of day.

byHrDay = varfun(@sum,bikeData,'GroupingVariables',{'HrOfDay','Day'},...
    'OutputFormat','table');
head(byHrDay)
ans=8×6 table
    HrOfDay       Day       GroupCount    sum_Total    sum_Westbound    sum_Eastbound
    _______    _________    __________    _________    _____________    _____________

       0       Sunday           56           473            345              128     
       0       Monday           55           202            145               57     
       0       Tuesday          55           297            213               84     
       0       Wednesday        56           374            286               88     
       0       Thursday         56           436            324              112     
       0       Friday           55           442            348               94     
       0       Saturday         56           580            455              125     
       1       Sunday           56           333            259               74     

To arrange the timetable so that the days of the week are the variables, use the unstack function.

hrAndDayWeek = unstack(byHrDay(:,{'HrOfDay','Day','sum_Total'}),'sum_Total','Day'); 
head(hrAndDayWeek)
ans=8×8 table
    HrOfDay    Sunday    Monday    Tuesday    Wednesday    Thursday    Friday    Saturday
    _______    ______    ______    _______    _________    ________    ______    ________

       0        473        202       297         374          436        442       580   
       1        333         81       147         168          173        183       332   
       2        198         77        68          93          128        141       254   
       3         86         41        43          44           50         61        80   
       4         51         81       117         101          108         80        60   
       5        105        353       407         419          381        340       128   
       6        275       1750      1867        2066         1927       1625       351   
       7        553       5355      5515        5818         5731       4733       704   

ribbon(hrAndDayWeek.HrOfDay,hrAndDayWeek{:,2:end})
ylim([0 24])
xlim([0 8])
xticks(1:7)
xticklabels({'Sun','Mon','Tue','Wed','Thu','Fri','Sat'})
ylabel('Hour')
title('Bicycle Count by Hour and Day of Week')

There are similar trends for the regular work days of Monday through Friday, with peaks at rush hour and traffic tapering off in the evening. Friday has less volume, though the overall trend is similar to the other work days. The trends for Saturday and Sunday are similar to each other, without rush hour peaks and with more volume later in the day. The late evening trends are also similar for Monday through Friday, with less volume on Friday.

Analyze Traffic During Rush Hour

To examine the overall time of day trends, split up the data by rush hour times. It is possible to use different times of day or units of time using the discretize function. For example, separate the data into groups for AM, AMRush, Day, PMRush, PM. Then use varfun to calculate the mean by group.

bikeData.HrLabel = discretize(bikeData.HrOfDay,[0,6,10,15,19,24],'categorical',...
    {'AM','RushAM','Day','RushPM','PM'});
byHrBin = varfun(@mean,bikeData(:,{'Total','HrLabel'}),'GroupingVariables','HrLabel',...
    'OutputFormat','table')
byHrBin=5×3 table
    HrLabel    GroupCount    mean_Total
    _______    __________    __________

    AM            2342         3.5508  
    RushAM        1564         94.893  
    Day           1955         45.612  
    RushPM        1564         98.066  
    PM            1955         35.198  

bar(byHrBin.mean_Total)
cats = categories(byHrBin.HrLabel);
xticklabels(cats)
title('Mean Bicycle Count During Rush Hours')

In general, there is about twice as much traffic in this area during the evening and morning rush hours compared to other times of the day. There is very little traffic in this area in the early morning, but there is still significant traffic in the evening and late evening, comparable to the day outside of the morning and evening rush hours.

See Also

| | | | | | | | | |

Related Topics