Loading large binary files in Matlab, quickly
Show older comments
I have some pretty massive data files (256 channels, on the order of 75-100 million samples) in int16 format. It is written in flat binary format, so the structure is something like: CH1S1,CH2S1,CH3S1 ... CH256S1,CH1S2,CH2S2,...
I need to read in each channel separately, filter and offset correct it, then save. My current bottleneck is loading each channel, which takes about 7-8 minutes... scale that up 256 times, and I'm looking at nearly 30 hours just to load the data! I am trying to intelligently use fread, to skip bytes as I read each channel; I have the following code in a loop over all 256 channels to do this:
offset = i - 1;
fseek(fid,offset*2,'bof');
dat = fread(fid,[1,nSampsTotal],'*int16',(nChan-1)*2);
Reading around, this is typically the fastest way to load parts of a large binary file, but is the file simply too large to do this any faster? Any suggestions would be much appreciated!
System details: MATLAB 2017a, Windows 7, 64bit
4 Comments
How fast is this?
tic
fread(fid, [256 Inf], '*int16')
toc
Test it on a smaller data set first. Do you have 256 x 100 Million data? so 256 x 2 byte x 100E6 = 51 GB? If so, it'll require a lot of RAM... Or if you have 100 Million data total (0.2GB), then it should be fast to load.
dpb
on 20 Aug 2018
How much RAM do you actually have? Sounds like the performance hit is probably that you're running into actually being swapped in/out of virtual memory; fread is pretty quick for straight data transfer to/from memory.
Is the processing required dependent upon having the whole timeseries in memory or can you do it piecewise on each channel?
You may just have a system limitation here...
Accepted Answer
More Answers (0)
Categories
Find more on Scripts in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!