How to use memmapfile for a very large structured binary file
6 views (last 30 days)
Show older comments
Jean-Daniel Saphores
on 1 Dec 2014
Edited: per isakson
on 19 May 2015
Hello:
I need to process a 62 GB structured binary file written from a 24 h simulation. The structure of the file is as follows:
ft(1).length = 1; ft(1).type = 'integer*4'; ft(1).name = 'VehID';
ft(2).length = 1; ft(2).type = 'real*4'; ft(2).name = 'Time';
ft(3).length = 1; ft(3).type = 'integer*4'; ft(3).name = 'Longitude';
ft(4).length = 1; ft(4).type = 'integer*4'; ft(4).name = 'Latitude';
ft(5).length = 1; ft(5).type = 'integer*2'; ft(5).name = 'Heading';
ft(6).length = 1; ft(6).type = 'integer*4'; ft(6).name = 'Segment';
ft(7).length = 1; ft(7).type = 'integer*2'; ft(7).name = 'Dir';
ft(8).length = 1; ft(8).type = 'integer*4'; ft(8).name = 'Lane';
ft(9).length = 1; ft(9).type = 'real*4'; ft(9).name = 'Offset';
ft(10).length = 1;ft(10).type = 'real*4'; ft(10).name = 'Distance';
ft(11).length = 1;ft(11).type = 'real*4'; ft(11).name = 'Speed';
ft(12).length = 1;ft(12).type = 'real*4'; ft(12).name = 'Acceleration';
I am able to read this file using readfields with the format above but it is taking forever to go through its 1,506,979,651 records. I would like to partition this file in 96 files based on the value of 'Time', which covers 24 hours (15 min increments -> 96 files), and keep only VehID, Time, Distance, Speed, and Acceleration. After extensive readings (I am still learning Matlab), I understand memmapfile would be a good way to go, but I am unable to make that command work. I would need help to write the appropriate memmapfile statement (especially the format) so I can process this file efficiently. Thank you for your help,
JDS
1 Comment
per isakson
on 1 Dec 2014
Edited: per isakson
on 1 Dec 2014
The free (as in beer) program GSplit might be an alternative to split the file. I was once able to use it successfully minutes after downloading.
Accepted Answer
per isakson
on 1 Dec 2014
Edited: per isakson
on 19 May 2015
This gave me a chance to try a complicated format. Result:
filespec = 'usgsdems.dat'; % A sample file I found in the Map Toolbox
n_repeat = 24*60/15;
nday = 1;
N = (nday-1) * n_repeat * sum([ 4, 4, 4, 4, 2, 4, 2, 4, 4, 4, 4, 4 ]);
%
mmp = memmapfile( filespec ...
, 'Offset' , N ...
, 'Format', {
'int32' , [1,1], 'VehID'
'single', [1,1], 'Time'
'int32' , [1,1], 'Longitude'
'int32' , [1,1], 'Latitude'
'int16' , [1,1], 'Heading'
'int32' , [1,1], 'Segment'
'int16' , [1,1], 'Dir'
'int32' , [1,1], 'Lane'
'single', [1,1], 'Offset'
'single', [1,1], 'Distance'
'single', [1,1], 'Speed'
'single', [1,1], 'Acceleration'
} ...
, 'Repeat', n_repeat );
>> mmp.Data(1).VehID
ans =
1701994860 % garbage but indicates the syntax is correct
>> mmp.Data(2).VehID
ans =
538976313
>> mmp.Data(n_repeat).VehID
ans =
538976288
However,
>> mmp.Data(2:3).VehID
Error using memmapfile/subsref (line 782)
A subscripting operation on the Data field attempted to create a comma-
separated list. The memmapfile class does not support the use of comma-
separated lists when subscripting.
 
"and keep only VehID, Time, Distance, Speed, and Acceleration"
AFAIK: The new files must be written line by line. Include only the fields, which shall be kept.
I'm not convinced the process will be fast.
0 Comments
More Answers (1)
See Also
Categories
Find more on Other Formats in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!