MATLAB Answers

Reading in large binary file with multiple data types (uint8, double, etc.)

39 views (last 30 days)
Adam Fishback
Adam Fishback on 29 Mar 2016
Edited: Geoff Hayes on 29 Mar 2016
Problem: I am trying to ready in a binary data file. The format of the file is a series of data "blocks" that each contain various data types in a repeating pattern. Example: A file contains a series of data "blocks" that are 17 bytes each, and there are 20 entries (for a total of 340 bytes). The first 8 bytes (64 bits) of a given "block" represent a double, the next byte is an unsigned 8-bit integer, the next 4 bytes represent a single, and the final 4 bytes are a 32-bit signed integer. This pattern is then repeated for all 20 entries.
I currently am using two nested "for" loops to read in these data types one at a time with the "fread" function. This works, but is very slow, especially for large files. What I'm attempting to do now is read in all the data at once as a series of uint8 values (which is much faster), then reshape it to a matrix (in this example, 20x17) and convert the values to the data type that I desire (e.g., take the first 8 columns of each row (20x8) and convert them into a double (20x1)).
I don't know of an easy way to do this. I could write the data to a temporary binary file and then read it back in as a new data type and remove the file, but I would rather not bother with the file I/O, there should be a way to do it in the workspace. The only other option I can think of is to convert the uint8 values into a binary string and manually reconstruct the new data type from the bits, but if there was a simpler built-in (and faster, and less error-prone) way to accomplish this I would prefer it.
Any suggestions are appreciated, thanks.

Answers (1)

Geoff Hayes
Geoff Hayes on 29 Mar 2016
Edited: Geoff Hayes on 29 Mar 2016
Adam - you may be able to use memmapfile to read the data from your file. According to its description, Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of memory addresses within the MATLAB® address space. Then, MATLAB can access files on disk in the same way it accesses dynamic memory, accelerating file reading and writing. Memory-mapping allows you to work with data in a file as if it were a MATLAB array.
For example, suppose we use the following code to create 25 blocks given your format for each
function createBinaryFile
fid = fopen('myBinaryData.dat','wb');
if fid
numBlocks = 25;
for k=1:numBlocks
% write the double
fwrite(fid,pi*k,'double'); % varA of block k will be pi*k
% write the unsigned integer
fwrite(fid,k,'uint8'); % varB of block k will be k
% write the single
fwrite(fid,pi/k,'single'); % varC of block k will be pi/k
% write the signed integer
fwrite(fid,k*k,'int32'); % varD of block k will be k*k
end
fclose(fid);
end
We can now use memmapfile to create the memory map to this file as
m = memmapfile('myBinaryData.dat',...
'Format',{'double',[1,1],'varA';...
'uint8', [1,1],'varB';...
'single',[1,1],'varC';...
'int32', [1,1],'varD'},'Repeat',25);
Note how we specify the format to be exactly how each block is written. We can then access any block as
m.Data(1)
ans =
varA: 3.1416
varB: 1
varC: 3.1416
varD: 1
or
m.Data(2)
ans =
varA: 6.2832
varB: 2
varC: 1.5708
varD: 4
If you don't know the number of blocks, then you can specify Inf in place of 25.
Try the above and see what happens!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!