Info

This question is closed. Reopen it to edit or answer.

Load the numeric data of a cyclic text file into a matrix

1 view (last 30 days)
Dear All,
I guess I have to rephrase my question since it has not receive much attention.
I have a text file in the following format:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
200
ITEM: BOX BOUNDS pp pp pp
0 23.5
0 23.5
0 23.5
ITEM: ATOMS id type x y z
1 1 4.629738099 19.15100895 8.591289203
2 1 5.379313371 19.12269554 8.727806695
3 2 7.531762324 13.25286645 4.981542453
4 2 7.427444873 13.99400029 5.110889318
ITEM: TIMESTEP
5
ITEM: NUMBER OF ATOMS
200
ITEM: BOX BOUNDS pp pp pp
0 23.5
0 23.5
0 23.5
ITEM: ATOMS id type x y z
1 1 4.602855537 28 8.610593144
2 1 5.399314789 19.12299845 8.70663802
3 2 7.539913654 13.25759311 4.99833023
4 2 7.479249704 13.99259535 5.137606665
The file contains of 6000000 of these cycles. I need to export the numeric data corresponding to the last three columns of each cycle into a matrix for all of the cycles.
In other words my desired output matrix should be in the following format:
4.629738099 19.15100895 8.591289203
5.379313371 19.12269554 8.727806695
7.531762324 13.25286645 4.981542453
7.427444873 13.99400029 5.110889318
4.602855537 28.00000000 8.610593144
5.399314789 19.12299845 8.70663802
7.539913654 13.25759311 4.99833023
7.479249704 13.99259535 5.137606665
As you can see the first 9 lines of each cycle was ignored and added cycles in order to have a target matrix.I do not like to print out this matrix, I just need it for further calculations. I hope you can help me. Thanks

Answers (1)

dpb
dpb on 17 Sep 2015
No matter what you do it likely is going to take a while if the file is that large. But, reading it is pretty straightforward...
fmt=[repmat('%*d',1,2) repmat('%f',1,3)];
N=4; % for the file as shown; I guess it would be 200 for the real file?
fid=fopen('yourfile');
i=0;
while ~feof(fid)
c{i,1}=textscan(fid,fmt,N,'headerlines',9,'collectoutput',1);
end
c=cell2mat(c);
You may speed it up some by preallocating a large "ordinary" array of Nx3, N = #atoms*groups if known and offsetting each portion read by 200 on each pass. Here I would then wrap the textscan call inside cell2mat to convert directly.
N=200; % could open file and read this, too first...
M=6000000; % # time steps in file...
fid=fopen('yourfile');
i1=1; i2=N; % initial indices to array rows
do i=1:6000000
d{i1:i2,:}=cell2mat(textscan(fid,fmt,N, ...
'headerlines',9,'collectoutput',1));
i1=i2+1; i2+i2+N; % increment
end
fid=fclose(fid);
Of course,
>> 6000000 * 200 * 8/1024/1024/1024
ans =
8.9407
>>
9 GB may be more than you can hold in memory at once...

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!