Documentation Center

  • Trial Software
  • Product Updates

Import Large Text File Data in Blocks

This example shows how to read small blocks of data from an arbitrarily large delimited text file using the textscan function and avoid memory errors. The first part of the example shows how to specify a constant block size. The second part of the example shows how to read and process each block of data in a loop.

Specify Block Size

Specify a constant block size, and then process each block of data within a loop.

Copy and paste the following text into a text editor to create a tab-delimited text file, bigfile.txt, in your current folder.

## A	ID = 02476
## YKZ Timestamp Temp Humidity Wind Weather
06-Sep-2013 01:00:00	6.6	89	4	clear
06-Sep-2013 05:00:00	5.9	95	1	clear
06-Sep-2013 09:00:00	15.6	51	5	mainly clear
06-Sep-2013 13:00:00	19.6	37	10	mainly clear
06-Sep-2013 17:00:00	22.4	41	9	mostly cloudy
06-Sep-2013 21:00:00	17.3	67	7	mainly clear
## B	ID = 02477
## YVR Timestamp Temp Humidity Wind Weather
09-Sep-2013 01:00:00	15.2	91	8	clear
09-Sep-2013 05:00:00	19.1	94	7	n/a
09-Sep-2013 09:00:00	18.5	94	4	fog
09-Sep-2013 13:00:00	20.1	81	15	mainly clear
09-Sep-2013 17:00:00	20.1	77	17	n/a
09-Sep-2013 18:00:00	20.0	75	17	n/a
09-Sep-2013 21:00:00	16.8	90	25	mainly clear
## C	ID = 02478
## YYZ Timestamp Temp Humidity Wind Weather

This file has commented lines beginning with ## , throughout the file. The data is arranged in five columns: The first column contains strings indicating timestamps. The second, third, and fourth columns contain numeric data indicating temperature, humidity and wind speed. The last column contains descriptive strings.

Define the size of each block to read from the text file. You do not need to know the total number of blocks in advance, and the number of rows of data in the file do not have to divide evenly into the block size.

Specify a block size of 5.

N = 5;

Open the file to read using the fopen function.

fileID = fopen('bigfile.txt');

fopen returns a file identifier, fileID, that the textscan function calls to read from the file. fopen positions a pointer at the beginning of the file, and each read operation changes the location of that pointer.

Describe each data field using format specifiers, such as '%s' for a string, '%d' for an integer, or '%f' for a floating-point number.

formatSpec = '%s %f %f %f %s';

In a while loop, call textscan to read each block of data. The file identifier, format specifer string, and the segment size (N), are the first three inputs to textscan. Ignore the commented lines using the CommentStyle name-value pair argument. Specify the tab delimiter using the Delimiter name-value pair argument. Then, process the data in the block. In this example, call scatter to display a scatter plot of temperature and humidity values in the block. The commands within the loop execute while the file pointer is not at the end of the file.

k = 0;
while ~feof(fileID)
    k = k+1;
C = textscan(fileID,formatSpec,N,'CommentStyle','##','Delimiter','\t');
figure, scatter(C{2},C{3}), title(['Temperature and Humidity, Block ',num2str(k)]);
end

textscan reads data from bigfile.txt indefinitely, until it reaches the end of the file or until it cannot read a block of data in the format specified by formatSpec. For each complete block, textscan returns a 1-by-5 cell array. Because the sample file, bigfile.txt, contains 13 rows of data, textscan returns only 3 rows in the last block.

View the temperature values in the last block returned by textscan.

C{2}
ans =

   20.1000
   20.0000
   16.8000

Close the file.

fclose(fileID);

Read Data with Arbitrary Block Sizes

Read and process separately each block of data between commented lines in the file, bigfile.txt. The length of each block can be arbitrary. However, you must specify the number of lines to skip between blocks of data. In bigfile.txt, each block of data is preceded by two lines of comments.

Open the file for reading.

fileID = fopen('bigfile.txt');

Specify the format of the data you want to read. Tell textscan to ignore certain data fields by including %* in the format specifier string, formatSpec. In this example, skip the third and fourth columns of floating-point data using '%*f'.

formatSpec = '%s %f %*f %*f %s';

Read a block of data in the file. Use the HeaderLines name-value pair argument to instruct textscan to skip two lines before reading data.

D = textscan(fileID,formatSpec,'HeaderLines',2,'Delimiter','\t')
D = 

    {7x1 cell}    [6x1 double]    {6x1 cell}

textscan returns a 1-by-3 cell array, D.

View the contents of the first cell in D.

D{1,1}
ans = 

    '06-Sep-2013 01:00:00'
    '06-Sep-2013 05:00:00'
    '06-Sep-2013 09:00:00'
    '06-Sep-2013 13:00:00'
    '06-Sep-2013 17:00:00'
    '06-Sep-2013 21:00:00'
    '## B'

textscan stops reading after the text, '## B', because it cannot read the subsequent text as a number, as specified by formatSpec. The file pointer remains at the position where textscan terminated.

Process the first block of data. In this example, find the maximum temperature value in the second cell of D.

maxTemp1 = max(D{1,2})
maxTemp1 =

   22.4000

Repeat the call to textscan to read the next block of data.

D = textscan(fileID,formatSpec,'HeaderLines',2,'Delimiter','\t')
D = 

    {8x1 cell}    [7x1 double]    {7x1 cell}

Again, textscan returns a 1-by-3 cell array.

Find the maximum temperature value in this block of data.

maxTemp2 = max(D{1,2})
maxTemp2 =

   20.1000

Close the file.

fclose(fileID);

See Also

|

More About

Was this topic helpful?