How can I extract particular columns of a huge text file?

1 view (last 30 days)
I have a 8hour data and I need 3 particular colomns at particular time intervals of this huge text file. 1.I have used fget1: which reads line by line and I used a for loop. 2. I used textscan and converted to a matrix (Eventually my system crashes due to the size of the data)
Both of them take a very long time to run. Is there any better way to extract data from huge text files?
Thanks, Mitrra
  4 Comments
Cedric
Cedric on 17 Aug 2013
Could you copy/paste 10 to 20 lines of this file here on the forum? Depending the format, there are ways to extract relevant line/columns before scanning them.
Mp897
Mp897 on 18 Aug 2013
Hi its a text file which is 2gb.My computer crashes when I try to open it as a text file. It has 16 columns each double values. They have only numbers and no letters. I was able to extract the columns I needed using textscan(Took 1.5 mins).
Thanks

Sign in to comment.

Answers (2)

Ken Atwell
Ken Atwell on 16 Aug 2013
I would use textscan, using "*" to eliminate the unneeded columns. Say you see columns 1, 3, 5:
textscan(fpi, '%f %*f %f %*f %f %*[^\n]);
This will only convert the necessary columns to binary, which should save a lot of time.
You can read more about using "*" in the documentation for textscan.
  2 Comments
Mp897
Mp897 on 18 Aug 2013
Using * helped me extracted the coloumns I need in 2 mins. Thanks for your help.
Ken Atwell
Ken Atwell on 19 Aug 2013
Good to hear. Don't forget to accept the answer. :)

Sign in to comment.


per isakson
per isakson on 16 Aug 2013
Edited: per isakson on 16 Aug 2013
Reading specific chunks of a huge file, that is a job for memmapfile. However, character is not in its list of data types. The default type is uint8. Take a chance and try
mmf = memmapfile( 'h:\m\Code2TMW\Path_potential_name_conflict.txt' );
str = char( mmf.Data(1:64) )'
it returns
str =
Warning: Function C:\Program Files\MATLAB\R2013a\toolbox\matlab\
which is indeed the text of the first line. Surely, the encoding of the text file matters.
  2 Comments
Mp897
Mp897 on 16 Aug 2013
Hi, Thanks for the reply. This works well.But I have a small doubt. What does str=char(mmf.Data(1:64))exactly do? In my case,It seems to extract the first 64 characters and converts each element in 1 row.So it gives huge number of rows. Thanks
per isakson
per isakson on 18 Aug 2013
Edited: per isakson on 18 Aug 2013
"So it gives huge number of rows" , which you have to parse with textscan or otherwise. The point is that you can read part of the file with
str = char( mmf.Data( huge_number+1 : huge_number+small_number_of_bytes ) )'
which gives a small number of rows.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!