Creative way to create a Matlab array from a textfile with multiple headers.

3 views (last 30 days)
I am trying to parse a molecular dynamics dump file which has headers printed periodically. Between two successive headers, I have data (not guaranteed that the lenght of data is the same between any two successive headers is the same) in a column format which I want to store and post-process. Is there a way I can do this without excessive use of for loops?
I have attached my file and the basic gist of it is:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ENTRIES
1079
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5]
1 1 94 0.0399999 0 0.171554 -0.00124379 0
2 1 106 0.0399999 0 -0.0638316 0.116503 0
3 1 204 0.0299999 0 -0.124742 0.0290103 0
4 1 675 0.0299999 0 0.0245382 -0.116731 0
5 2 621 0.03 0 0.0328324 0.00185942 0
6 2 656 0.04 0 -0.0315086 0.016237 0
7 2 671 0.04 0 -0.00291159 -0.0169882 0
8 3 76 0.03 0 0.01775 0.0100646 0
9 3 655 0.03 0 0.00434063 -0.00750336 0
.
.
.
.
.
1076 678 692 100000 0 -0.222481 -1.44632e-06 0
1077 679 692 100000 0 -0.00232206 -8.05951e-09 0
1078 682 691 100000 0 0.0753935 -2.89438e-07 0
1079 687 692 100000 0 -0.0153246 -2.51076e-08 0
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ENTRIES
1078
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5]
1 1 94 0.0399997 0 1.3535 -0.00981109 0
2 1 106 0.0399986 0 -6.36969 11.6275 0
3 1 204 0.0299893 0 -236.114 54.9339 0
4 1 675 0.0299998 0 0.148064 -0.704365 0
.
.
.
.
TIA.
  3 Comments
Devanjith Fonseka
Devanjith Fonseka on 11 Dec 2020
Currently I have been using fgetl.. maybe thhats why? Had to write a loop to ignore the header and then another loop to loop over the column for a given row. It's not hard but just seems inefficient. I think Stephen's answer is promising.

Sign in to comment.

Answers (1)

J. Alex Lee
J. Alex Lee on 12 Dec 2020
This will grab the data, but it will not be organized...you can change internal storage to suit your needs. I'm curious if this is at all what Stephen had in mind, I'm happy to learn better ways...
srcpath = "tmp2.txt";
% the rest could be in a function
% define the header format: look for lines that start with ITEM
% and capture the next word (assume next words are unique)
% use that word as the block name
fmts.HEADER = "%*[(^ITEM\\:)] %s %*[^\n]";
% for each block name, define the ensuing data format
fmts.TIMESTEP = "%f";
fmts.NUMBER = "%d";
fmts.BOX = "%f %f %f";
fmts.ENTRIES = "%d %d %d %f %f %f %f %f";
% initialize the counter for the "loop"
% over "sets" of the blocks, i.e., TIMESTEPS
cntr = 0;
% open the file
fid = fopen(srcpath);
% start file read
while ~feof(fid)
% read the header
hdr = textscan(fid,fmts.("HEADER"),"TextType","string");
% define the current block
curblock = hdr{1};
% if the block is a TIMESTEP, advance the counter
if curblock=="TIMESTEP"
cntr = cntr + 1;
end
% now read the data according to the block type
% and store into some kind of scheme, I chose a structure array
% but the choice is not too rational since the results of textscan will be wrapped in {}
data(cntr,1).(curblock) = textscan(fid,fmts.(curblock));
end % end file read
% close the file
fclose(fid);

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!