MATLAB Answers


Is it possible to split a large text file into half and subsequently use textscan for both parts?

Asked by Atipong
on 13 May 2013


This is my first time in this forum.

I am working on a large text file containing a large number of data 10^5 * 600 of 16-digit elements. I use the textscan command to read a string data. I already known the number of columns, so I am able to generate a format spec beforehand. The main part of my code is shown below:


When I specify the NumRow (number of rows) as 50000 or below, it works fine and only took about 1 minute to run. However, my system seems to crash when I increase the NumRow to 100,000. I suspect that my virtual memory has reached its limit.

Therefore, I wonder that is there a way I can split the data into two parts. Say, from the 1st -50,000th row and 50000th -100000th row

Thanks! Ati


Hi, you mention that you have 16 digit elements; could you provide a copy paste of the first 3 or 4 on a typical line?

on 14 May 2013


It's something like this, with 10^5 rows and 600 columns separated by space.

-4.7533250000e-05 -4.8990000000e-05 -3.5166750000e-01

1.5550000000e-02 -1.5832100000e-09 -4.3949250000e-01

-1.9371000000e-04 -1.1074875000e-01 -6.1198500000e-01

So when there is no minus sign, there are two spaces?



No products are associated with this question.

2 Answers

Answer by per isakson
on 13 May 2013
Edited by per isakson
on 13 May 2013

Something like this

    nRow = 50000;
    fid  = fopen( ... )
    buf1 = textscan( fid, ..., nRow, .... );
    buf2 = textscan( fid, ..., nRow, .... );
    fclose( fid );


on 14 May 2013

Your method answered my question well, but seems like it still reads the whole data once so that all the available memory is used accordingly.

Is there any way to clear the memory on the half way to the final row? The final outcome that I want is just the matrix containing all the data.

You have to process the data in buf1 and

    clear buf1

before reading the rest of the file. Or

    buf = textscan( fid, ..., nRow, .... );
    buf = textscan( fid, ..., nRow, .... );

I guess, I would have written the data to one or more binary files and used memmapfile to work with the data.

per is correct.

To be explicit, textscan() does not read in the entire file when you specify the repeat count.

Answer by Yao Li
on 14 May 2013

You can use for loops to auto-generate the formatSpec for textscan(). For example, you can read two column at a time by defining formatSpec as:

for j=1:300
    for k=1:600
    for i=3:600


Join the 15-year community celebration.

Play games and win prizes!

Learn more
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

MATLAB Academy

New to MATLAB?

Learn MATLAB today!