How can I extract particular columns of a huge text file?
1 view (last 30 days)
Show older comments
I have a 8hour data and I need 3 particular colomns at particular time intervals of this huge text file. 1.I have used fget1: which reads line by line and I used a for loop. 2. I used textscan and converted to a matrix (Eventually my system crashes due to the size of the data)
Both of them take a very long time to run. Is there any better way to extract data from huge text files?
Thanks, Mitrra
4 Comments
Cedric
on 17 Aug 2013
Could you copy/paste 10 to 20 lines of this file here on the forum? Depending the format, there are ways to extract relevant line/columns before scanning them.
Answers (2)
Ken Atwell
on 16 Aug 2013
I would use textscan, using "*" to eliminate the unneeded columns. Say you see columns 1, 3, 5:
textscan(fpi, '%f %*f %f %*f %f %*[^\n]);
This will only convert the necessary columns to binary, which should save a lot of time.
2 Comments
per isakson
on 16 Aug 2013
Edited: per isakson
on 16 Aug 2013
Reading specific chunks of a huge file, that is a job for memmapfile. However, character is not in its list of data types. The default type is uint8. Take a chance and try
mmf = memmapfile( 'h:\m\Code2TMW\Path_potential_name_conflict.txt' );
str = char( mmf.Data(1:64) )'
it returns
str =
Warning: Function C:\Program Files\MATLAB\R2013a\toolbox\matlab\
which is indeed the text of the first line. Surely, the encoding of the text file matters.
2 Comments
per isakson
on 18 Aug 2013
Edited: per isakson
on 18 Aug 2013
"So it gives huge number of rows" , which you have to parse with textscan or otherwise. The point is that you can read part of the file with
str = char( mmf.Data( huge_number+1 : huge_number+small_number_of_bytes ) )'
which gives a small number of rows.
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!