MATLAB Answers

Reading binary files consisting of different data types without a for loop.

8 views (last 30 days)
Adam on 15 Sep 2014
Commented: Guillaume on 17 Sep 2014
I have a binary data file that consists of some M data sets. Each set of data is made up of Nbytes of a specific template, e.g. [uint16, uint16, uint16, uint32, double, uint32, int16]. Right now I'm just looping over how many data sets I have and reading the information in each data set according to it's type.
for j = 1:Mdatasets
this(j) = fread(fid,1,'uint32');
foo(j) = fread(fid,1,'uint16');
foofoo(j) = fread(fid,1,'double');
% and so on...
Is there a faster way to do this? It can't take a very long time to read some of my larger files (~500MB). I was thinking that if you could give fread() a data type template to repeat over and over like it can do with a single data type, that would be ideal. Not sure if there is a way to do this, or if someone has a way around it, but for loops take so long.
Best Regards,

  1 Comment

Image Analyst
Image Analyst on 15 Sep 2014
The for loop is definitely NOT the problem You can do tens of millions of iterations in less than a second and I'm sure you don't have that many files. The time is being taken up by the disk I/O rather than the for loop.

Sign in to comment.

Answers (2)

Guillaume on 15 Sep 2014
Edited: Guillaume on 17 Sep 2014
Use the skip argument of fread to read all the elements of the same type at once. From my reading of fread doc, skip is not very straightforward to calculate. This may work:
fieldsizes = [4 2 8 ...]; %uint32 uint16 double ...
skip = @(n) sum([fieldsize(1:n+1) fieldsize(n+1:end)]); %sum up of field sizes except field n
offset = @(n) sum(fieldsizes(1:n)); %offset to element n+1
this = fread(fid, Mdatasets, 'uint32', skip(1));
fseek(fid, offset(1), -1);
foo = fread(fid, Mdatasets, 'uint16', skip(2));
fseek(fid, offset(2), -1);
foofoo = fread(fid, Mdatasets, 'double', skip(3));
fseek(fid, offset(3), -1);
edit: added fseek as per Michael comment.


Show 2 older comments
Michael Haderlein
Michael Haderlein on 16 Sep 2014
My first idea was similar to Iain's. However, Guillaume's answer also should work and might be faster (?). But if you use Guillaume's way, I think you'll need frewind(fid) after each fread as your file pointer will be somewhere at the end of the file after the fread operation.
Iain on 16 Sep 2014
Speed is probably going to be dictated by how quick the disk accesses are.
On a slow network I know it's faster to read in a big file just once (really slow disk accesses), then use typecast. On a local drive, it's the opposite.
Guillaume on 17 Sep 2014
Thanks Michael for the reminder about the file pointer, I'd completely missed that. It's actually fseek that's needed since you need to go back to the right element.
Reading the file my or Iain's way is bound to be faster than reading it one element at a time.
A third option, probably the fastest is to write a mex file.

Sign in to comment.

Image Analyst
Image Analyst on 15 Sep 2014
You're reading 1 byte at a time - no wonder it takes so long. Read in a whole image at a time:
thisImage = fread(fid, [rows, columns], '*uint16');


Show 2 older comments
Adam on 16 Sep 2014
Image Analyst,
I feel like there is some miscommunication here; did you read the problem? As I mentioned initially, there are repeated sets of data in my binary file. I may have not mentioned that within these sets, there are instances of common data types across some sections, and I use
fread(fid,N,datatype) where N > 1
However, that is irrelevant as the overall data set is composed of many data types. I think the best way to handle this would be what Guillaume suggested, however, it would be nice if fread() had the ability to accept a data type template structure or cell array S, where S would be something like:
S = {'unit16','int32','double','int16','int16','uint32','double'};
So that fread() could just repeat that read N times instead of only utilizing a single data type.
Image Analyst
Image Analyst on 16 Sep 2014
If you have patterns of data in one file, like a bunch of small variables (header info) and then maybe a big image, then you could make a subroutine to do the "common" part. It could take in a page or slice number and use fseek to go to the starting point for that page/slice. I've written readers for custom image formats, like CT data, and I could give you examples if you want. I read in the header and image data.
Adam on 16 Sep 2014
Right, so you're suggesting skipping around the binary file and reading the 'common' data types like Guillaume suggested.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!