# Why fwrite is ~320x slower in the second interation and onwards when writing interleaved data?

14 views (last 30 days)
I'd like to write a binary file, which should contain four vectors of integers in the same length in an interleaved fashion.
For example, let si,j denote the sample from the i th channel at time step j. I can prepare a long vector just as above and then pass it to fwrite and that is easier.
But that design is not great, because when individual vectors are really long, and when I have to accomodate a lot of vectors rather than just four, it will enconter a memory issue.
So, a clever way to to this is to write one vector input at a time without keeping all of them in the memory.
The code like below works, but I found that the performace of fwrite is much worse in the second iteration onwards compared to the first iteration.
While t3 in the first loop was less than a second (0.281 sec), t3 took roughly 1 min 30 sec for the second to fourth iterations (~320 times slower). I wonder why this is so slow, and if there is a work around for this.
fid = fopen(newfile,'w');
for i = 1:4
data = load_data(filepath); % int16 vector
status = fseek(fid,(i-1)*2,'bof');
if status == -1
error('fseek failed')
end
t1 = datetime;
fwrite(fid, data, 'int16', 2*(n-1));
t2 = datetime;
t3 = duration(t2-t1) % why this is slower in second iteration and onwards?
end
fclose(fid);
t3 = duration
00:00:00.2810
t3 = duration
00:01:37.2600
t3 = duration
00:01:29.9700
t3 = duration
00:01:29.6490
90 (sec) / 0.281 (sec) == ~320 (times)
A simplified code for testing
The longer the vectors are the slower the second and third iterations become, suggesting the slowness has something to do with data before reaching the end of file rather than adding extra bytes at the end of the file.
len = 1000;
i1 = ones(len,1,'int16');
i2 = ones(len,1,'int16').*2;
i3 = ones(len,1,'int16').*3;
fid = fopen('temp.bin','w')
t1= datetime;
fwrite(fid,i1,'int16',2*2);
t2 = datetime;
t3(1,1) = duration(t2-t1);
t1= datetime;
fseek(fid,2, 'bof')
fwrite(fid,i2,'int16',2*2);
t2 = datetime;
t3(2,1) = duration(t2-t1);
t1= datetime;
fseek(fid,2*2, 'bof')
fwrite(fid,i3,'int16',2*2);
t2 = datetime;
t3(3,1) = duration(t2-t1);
fclose(fid);
t3.Format = 'dd:hh:mm:ss.SSSS'
len = 1000
t3 = 3×1 duration array
00:00:00.0070
00:00:00.0140
00:00:00.0190
2~3 times slower
len = 10000
t3 = 3×1 duration array
00:00:00.0060
00:00:00.0850
00:00:00.1050
14~18 times slower
len = 100000
t3 = 3×1 duration array
00:00:00.0280
00:00:00.9820
00:00:00.9080
35 times slower

Guillaume on 22 Jul 2019
I would suspect the reason for the much slower writes in iteration 2 and onward is that for the first iteration, since the file is new, you're just writing (your data, and 0s in between). From iteration 2, matlab must first read the relevant part of the file, insert the new content and rewrite that.
If you can't fit the whole interlaced data in memory in one go, then you should read the input data in chunk, interlace it in memory, then write it as one continuous block. Repeat for the next chunk of data.
Kouichi C. Nakamura on 23 Jul 2019

Walter Roberson on 22 Jul 2019
Edited: Walter Roberson on 22 Jul 2019
fseek after end of file followed by fwrite, results in the data being written at eof. This is contrary to POSIX which requires that the gap be filled with 0 (possibly implicitly with a Demand Zero scheme). In MATLAB if you want to write at some point after eof you must write into the gap yourself.
Kouichi C. Nakamura on 23 Jul 2019
> In MATLAB if you want to write at some point after eof you must write into the gap yourself.
In my example above, there are four iterations.
For i = 1, fwrite reaches the end of the file for the first time.
For i = 2 to 4, however, fwrite needs to add 2 extra bytes at the end of file for every iteration. But does this small addition takes 320x more time? Or have I completely missed your point here? I don't really know what POSIX means to be honest.
In oder to further examine the issue, I wrote a simplified test code, which I thought inherits all the key features of the original code. To my dismay, this didn't result in massive slow down. Mmm, I tested this on Mac, while the above was done on Windows.
i1 = ones(1000,1,'int16');
i2 = ones(1000,1,'int16').*2;
i3 = ones(1000,1,'int16').*3;
fid = fopen('temp.bin','w')
t1= datetime;
fwrite(fid,i1,'int16',2*2);
t2 = datetime;
t3(1,1) = duration(t2-t1);
t1= datetime;
fseek(fid,2, 'bof')
fwrite(fid,i2,'int16',2*2);
t2 = datetime;
t3(2,1) = duration(t2-t1);
t1= datetime;
fseek(fid,2*2, 'bof')
fwrite(fid,i3,'int16',2*2);
t2 = datetime;
t3(3,1) = duration(t2-t1);
fclose(fid);
t3.Format = 'dd:hh:mm:ss.SSSS'
t3 = 3×1 duration array
00:00:00.0665
00:00:00.0738
00:00:00.1156

R2019a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!