How to read a large text file quickly ( exceeding 20GB)

26 views (last 30 days)
I have a large text file with 3000000 rows and 1200 columns. I have split this into 15 files of 200000 rows each. But even the smaller files are taking huge time when using dlmread to read them. Is there a way to read these files much faster ??
will load/textscan be of any help than dlmread ??
Also is there a way to read the original file with 3000000 rows directly without splitting into smaller files ??

Answers (1)

Titus Edelhofer
Titus Edelhofer on 25 Jun 2014
Hi Sivanand,
you will be able to read the original file using fopen and textscan, because you can use textscan in a loop to read chunks of the file (e.g. 100000 lines per iteration). Some questions:
  • 1200 columns is of course a lot... do you need all of them? If not, use %*f to skip e.g. a column with numbers
  • load: will not be faster than dlmread
In any case: it will not be really fast, and since the file is that large, it will take up a considerably amount of memory as well. Does the text file change? Otherwise I would suggest to read the data once (and don't worry too much about the time) and then save in binary format using save. The next time it will be significantly faster to read.
Titus
  3 Comments
Titus Edelhofer
Titus Edelhofer on 26 Jun 2014
Hi,
that would be something like
fid = fopen('largedata.txt', 'rt');
formatString = repmat('%f,', 1, 1200);
formatString (end) = [];
allData = zeros(0, 1200);
while ~feof(fid)
data = textscan(fid, formatString, 100000);
allData = [allData; [data{:}]];
end
fclose(fid);
If you know before how many lines you have, you should of course preallocate allData instead of concatenating.
Titus
Ken Atwell
Ken Atwell on 27 Jun 2014
Do you have enough memory to do all of this? 3000000x1200x8 is something like 30 GB of physical memory to hold the matrix. Plus you need more free memory to be able to perform any calculations.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!