Question about optimizing reading data from text file

2 views (last 30 days)
Hello, thanks for reading this,
I currently have a reader that reads in mesh files, and it works, but depending on the size of the file it can take a very long time. I was hoping I can optimize it for speed.
What I do first is read in a text file and change every line into a matrix of characters using the lines:
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A = char( cac{1} );
where A is my character matrix. I then search through the text file for identifiers for data I need. How I accomplish this is by setting start of data indices and end of data indices. I basically read this line by line, and at the moment, I assume it will always be formatted in a certain way.
After I have these indices, I use sscanf functions to read the characters as %f or %x numbers and store them into matrices. This is the part where the profiler says it takes the longest to complete.
I posted the MATLAB reader function here: http://pastebin.com/FFtgXzg4, since it is a bit long to post here. My specific questions are: do I have to convert the whole text import into a character matrix, and is there any way I can do this without needing a for loop? The loops using sscanf take a very long time.
It works, but just barely so. I can send a test import file if needed.
  1 Comment
Cedric
Cedric on 24 May 2013
Could you post e.g. 20 lines of your data file, and define these identifiers that are are referring to?

Sign in to comment.

Answers (1)

Jonathan Sullivan
Jonathan Sullivan on 23 May 2013
Edited: Jonathan Sullivan on 23 May 2013
You may want to use fread and regexp.
Without seeing your file, I can't say for sure this will produce the same result, but it should give you a good starting point.
% Using regexp and fread
fid = fopen(filename,'r');
tic;
A = regexp(fread(fid,'*char')','\n','split');
A = char( A{:} );
toc
fclose(fid);
% Using textscan
fid = fopen(filename,'r');
tic;
B = textscan(fid,'%[^\n]');
B2 = char(B{1});
toc
fclose(fid);
  1 Comment
Brian
Brian on 23 May 2013
It seems that the text scan I have goes slightly faster than the regexp/fread combination. There is one last part of the code that seems to be giving me problems:
When I have my start and end indices, I use sscanf line by line to give me the real data I need. However, some of my character matrices can be very large: sometimes spanning hundreds of thousands of rows (depending on the number of tetrahedra I have).
Is it possible to read this in any kind of intelligent fashion using sscanf line by line, or use it as a vector component, or should I look into exporting the matrix to a formatted text file and re-importing it using textread and hex2dec?
In these areas, I will always have the following combination of characters:
xxx xxx xxxx x x,
where I believe it can be split by a space delimiter. That leaves me with five hexadecimal values per row.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!