Question about reading in text files: alternatives

8 views (last 30 days)
Hello, thanks for reading this.
I wrote a reader for importing ANSYS mesh files, but in my opinion its a bit inelegant. What I do is read the file, write all lines as strings, and then parse through the file for identifiers (like point and connectivity information). It works, but it is slow. Any file around 1 MB loads slowly, and anything larger loads exponentially slower.
Is there a better way of doing this? I currently open the files and parse every line into a string with the commands:
function [Points, vFaceMx] = getPointsAndFacesforMESH(fileName)
wb2 = waitbar(0,'Loading Mesh');
filename=fileName;
fid = fopen(filename, 'rt');
nLines = 0;
while (fgets(fid) ~= -1),
nLines = nLines+1;
end
fclose(fid);
fid = fopen(filename, 'rt');
A=[];
ct = 0;
%%Write all lines as strings
while feof(fid) == 0
tline = fgetl(fid);
A_c=size(A, 2);
t_c=size(tline, 2);
if A_c > t_c
tline=[tline, NaN(size(tline, 1), A_c-t_c)];
end
if A_c < t_c
A=[A, NaN(size(A, 1), t_c-A_c)];
end
A = [A; tline];
end
fclose(fid);
And from there, I parse through using strcmp commands. I load the data I want into data arrays of strings, then I use sscanf commands to bring it back into numerical data.
Any advice would be appreciated.
  6 Comments
Cedric
Cedric on 25 Feb 2013
Edited: Cedric on 25 Feb 2013
As mentioned above, the best way to discuss the method is certainly to paste part of the file (e.g. 20-40 first rows) below the original question. When you have a text file, what you read is most often strings, so there is no need perform a translation to string.If you look at the class of tline right after the call to fget(), you will see that it is char. The only thing that you need to do in principle is parsing and extracting content as string/integer/double/etc from the lines that you read. There are several ways to achieve this. As mentioned, for most simple cases were lines have a simple, regular structure, f/scanf() will be fine; for more complicated cases, regular expressions [regexp()] are usually an invaluable tool when available.
Morteza
Morteza on 25 Feb 2013
Edited: Morteza on 25 Feb 2013
str2doubleq.cpp
this function is really fast to converting string data to numerical data. you can download it here and use according it's description.

Sign in to comment.

Accepted Answer

per isakson
per isakson on 25 Feb 2013
Edited: per isakson on 25 Feb 2013
Some comments:
  • I assume it is a text file that resembles the example below
  • I guess that line-breaks are not really significant
  • the first while-loop counts the lines - is that needed?
  • in the second while-loop A is growing, which is bad for performance
  • the lines are padded with char(0) - space char(32) is "more standard"
  • I assume your file fits in memory (ram)
  • the example code below with textscan returns A, which is identical to A returned by getPointsAndFacesforMESH - with the exception of padding with char(32).
tic,
str = fileread( filespec );
et = toc;
tic,
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A1 = char( cac{1} );
et = [ et, toc ];
tic,
[ A2, ~ ] = getPointsAndFacesforMESH( filespec );
et = [ et, toc ];
.
Sample text file
(0 "GAMBIT to Fluent File")
(0 "Dimension:") (2 2)
(10 (0 1 10 1 2)) (10 (1 1 10 1 2)(
0.0000000000e+000 1.0000000000e+000
1.0000000000e+000 1.0000000000e+000
0.0000000000e+000 0.0000000000e+000
1.0000000000e+000 0.0000000000e+000
1.0000000000e+000 3.3333333333e-001
1.0000000000e+000 6.6666666667e-001
0.0000000000e+000 6.6666666667e-001
0.0000000000e+000 3.3333333333e-001
3.3333333333e-001 1.0000000000e+000
6.6666666667e-001 1.0000000000e+000
3.3333333333e-001 0.0000000000e+000
6.6666666667e-001 0.0000000000e+000
6.6666666667e-001 3.3333333333e-001
6.6666666667e-001 6.6666666667e-001
3.3333333333e-001 3.3333333333e-001
3.3333333333e-001 6.6666666667e-001 ))
(0 "Faces:") (13(0 1 18 0))
(13(3 1 9 3 0)
( 2 1 7 9 0 2 7 8 6 0 2 8 3 3 0 2 3 b 3 0 2 b c 2 0 2 c ... 6 4 0 2 6 2 7 0 ))
(13(4 a c 14 0)( 2 1 9 0 9 2 9 a 0 8 2 a 2 0 7 ))
(13(6 d 18 2 0)
( 2 d c 1 2 2 5 d 1 4 2 f b 2 3 2 d f 2 5 2 f 8 3 6 2 e ... 7 8 2 9 10 8 9 ))
(0 "Cells:") (12 (0 1 9 0)) (12 (2 1 9 1 3))
(0 "Zones:") (45 (2 fluid fluid)())
(45 (3 wall new_wall.4)())
(45 (4 mass-flow-inlet wall.4)())
(45 (6 interior default-interior)())
  3 Comments
Brian
Brian on 25 Feb 2013
Wow, I just tried this, and its amazing how much faster this is. Thanks, a lot. I'm going to look more into these lines in my own time:
fid = fopen( filespec, 'r' ); cac = textscan( fid, '%[^\n]' ); fclose(fid); A = char( cac{1} );
because these seem to contain all the magic. My code is now benchmarked by the visualization of the mesh, which is to be expected of MATLAB.
Thanks a lot!
per isakson
per isakson on 26 Feb 2013
Edited: per isakson on 26 Feb 2013
"A in the second while-loop A is growing," [sic]
Search for "preallocating memory" in the help. Doc says:
Preallocating Memory
Repeatedly expanding the size of an array over time, (for example, adding more
elements to it each time through a programming loop), can adversely affect the
performance of your program. This is because
MATLAB has to spend time allocating more memory each time you increase the
size of the array.
This newly allocated memory is likely to be noncontiguous, thus slowing down
any operations that MATLAB needs to perform on the array.
.
enough RAM
when working with files it makes a big difference if the file fits in the system cache. See the Windows Task Manager.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!