This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Read and Analyze Large Tabular Text File

This example shows how to create a datastore for a large text file containing tabular data, and then read and process the data one chunk at a time or one file at a time.

Create a Datastore

Create a datastore from the sample file airlinesmall.csv using the datastore function. When you create the datastore, you can specify that the text, NA, in the data is treated as missing data.

ds = datastore('airlinesmall.csv','TreatAsMissing','NA');

datastore returns a TabularTextDatastore. The datastore function automatically determines the appropriate type of datastore to create based on the file extension.

You can modify the properties of the datastore by changing its properties. Modify the MissingValue property to specify that missing values are treated as 0.

ds.MissingValue = 0;

In this example, select the variable for the arrival delay, ArrDelay, as the variable of interest.

ds.SelectedVariableNames = 'ArrDelay';

Preview the data using the preview function. This function does not affect the state of the datastore.

data = preview(ds)
data=8×1 table
    ArrDelay
    ________

        8   
        8   
       21   
       13   
        4   
       59   
        3   
       11   

Read Subsets of Data

By default, read reads from a TabularTextDatastore 20000 rows at a time. To read a different number of rows in each call to read, modify the ReadSize property of ds.

ds.ReadSize = 15000;

Read subsets of the data from ds using the read function in a while loop. The loop executes until hasdata(ds) returns false.

sums = [];
counts = [];
while hasdata(ds)
    T = read(ds);
    
    sums(end+1) = sum(T.ArrDelay);
    counts(end+1) = length(T.ArrDelay);
end

Compute the average arrival delay

avgArrivalDelay = sum(sums)/sum(counts)
avgArrivalDelay = 6.9670

Reset the datastore to allow rereading of the data.

reset(ds)

Read One File At a Time

A datastore can contain multiple files, each with a different number of rows. You can read from the datastore one complete file at a time by setting the ReadSize property to 'file'.

ds.ReadSize = 'file';

When you change the value of ReadSize from a number to 'file' or vice versa, MATLAB resets the datastore.

Read from ds using the read function in a while loop, as before, and compute the average arrival delay.

sums = [];
counts = [];
while hasdata(ds)
    T = read(ds);
    
    sums(end+1) = sum(T.ArrDelay);
    counts(end+1) = length(T.ArrDelay);
end
avgArrivalDelay = sum(sums)/sum(counts)
avgArrivalDelay = 6.9670

See Also

| | |

Related Topics