Import Mixed Data from Text Files

This example shows how to use the readtable function to import mixed data from a text file into a table. Then, it shows how to modify and analyze the data in the table.

Sample File Overview

The sample file, outages.csv, contains data representing electric utility outages in the US. These are the first few lines of the file:

Region,OutageTime,Loss,Customers,RestorationTime,Cause
SouthWest,01/20/2002 11:49,672,2902379,01/24/2002 21:58,winter storm
SouthEast,01/30/2002 01:18,796,336436,02/04/2002 11:20,winter storm
SouthEast,02/03/2004 21:17,264.9,107083,02/20/2004 03:37,winter storm
West,06/19/2002 13:39,391.4,378990,06/19/2002 14:27,equipment fault

The file contains six columns. The first line in the file lists column titles for the data. These are the column titles, along with a description of the data in that column:

  • Region: Text value for one of five regions where each electrical outage occured

  • OutageTime: Date and time at which the outage started, formatted as month/day/year hour:minute

  • Loss: Numeric value indicating the total power loss for the outage

  • Customers: Integer value indicating the number of customers impacted

  • RestorationTime: Date and time at which power was restored, formatted as month/day/year hour:minute

  • Cause: Category for the cause of the power outage, provided as text.

Specify Format of Data Fields

Create a string of format specifiers to describe the data in the text file. You can then pass the format specifiers to the readtable function to import the data. Because outages.csv contains six columns of data, create a string of six format specifiers, such as '%f' for a floating-point number, '%C' for a categorical value, and '%D' for a date and time value.

formatSpec = '%C%{MM/dd/yyyy HH:mm}D%f%f%{MM/dd/yyyy HH:mm}D%C';

The formatSpec string tells readtable to read the first and last columns in the file as categorical data, the second and fifth columns as formatted date and time data, and the third and fourth columns as floating-point values. For the %{MM/dd/yyyy HH:mm}D specifiers, the text between the curly braces describes the format of the date and time data.

Read Text File

Call readtable to read the file. Use the Delimiter name-value pair argument to specify the delimiter. The default delimiter is a comma. Use the Format name-value pair argument along with the formatSpec value to describe the format of the data fields in the file.

T = readtable('outages.csv','Delimiter',',', ...
    'Format',formatSpec);

readtable returns a table containing the outage data.

View the first five rows and first four variables of the table.

T(1:5,1:4)
ans = 

     Region         OutageTime        Loss     Customers 
    _________    ________________    ______    __________

    SouthWest    02/01/2002 12:18    458.98    1.8202e+06
    SouthEast    01/23/2003 00:49    530.14    2.1204e+05
    SouthEast    02/07/2003 21:15     289.4    1.4294e+05
    West         04/06/2004 05:44    434.81    3.4037e+05
    MidWest      03/16/2002 06:18    186.44    2.1275e+05

The type of data contained in the table is mixed. The first and last variables are categorical arrays, the second and fifth variables are datetime arrays, and the remaining variables are numeric data.

Modify Imported Data

Modify the format of the datetime columns in T.

T.OutageTime.Format = 'dd-MMM-yyyy HH:mm:ss';
T.RestorationTime.Format = 'dd-MMM-yyyy HH:mm:ss';

View the first five rows and first four variables of the table.

T(1:5,1:4)
ans = 

     Region           OutageTime          Loss     Customers 
    _________    ____________________    ______    __________

    SouthWest    01-Feb-2002 12:18:00    458.98    1.8202e+06
    SouthEast    23-Jan-2003 00:49:00    530.14    2.1204e+05
    SouthEast    07-Feb-2003 21:15:00     289.4    1.4294e+05
    West         06-Apr-2004 05:44:00    434.81    3.4037e+05
    MidWest      16-Mar-2002 06:18:00    186.44    2.1275e+05

Append to Imported Data

Calculate the duration of each electrical outage and append the data to the table.

T.Duration = T.RestorationTime - T.OutageTime;

View the first five rows of the data in the Duration column of T.

T.Duration(1:5)
ans = 

   148:32:00
         NaN
   226:59:00
    00:26:00
    65:05:00

Sort Imported Data

Sort the table by the OutageTime variable. Then, view the first five rows and first four variables of the sorted table.

T = sortrows(T,'OutageTime','ascend');
T(1:5,1:4)
ans = 

     Region           OutageTime          Loss     Customers 
    _________    ____________________    ______    __________

    SouthWest    01-Feb-2002 12:18:00    458.98    1.8202e+06
    MidWest      05-Mar-2002 17:53:00    96.563    2.8666e+05
    MidWest      16-Mar-2002 06:18:00    186.44    2.1275e+05
    MidWest      26-Mar-2002 01:59:00    388.04    5.6422e+05
    MidWest      20-Apr-2002 16:46:00     23141           NaN

See Also

More About

Was this topic helpful?