I'd like to write a binary file, which should contain four vectors of integers in the same length in an interleaved fashion.

For example, let si,j denote the sample from the i th channel at time step j.

I can prepare a long vector just as above and then pass it to fwrite and that is easier.

But that design is not great, because when individual vectors are really long, and when I have to accomodate a lot of vectors rather than just four, it will enconter a memory issue.

So, a clever way to to this is to write one vector input at a time without keeping all of them in the memory.

The code like below works, but I found that the performace of fwrite is much worse in the second iteration onwards compared to the first iteration.

While t3 in the first loop was less than a second (0.281 sec), t3 took roughly 1 min 30 sec for the second to fourth iterations (~320 times slower). I wonder why this is so slow, and if there is a work around for this.

fid = fopen(newfile,'w');

for i = 1:4

filepath = fullfile(datadir,filenames{i});

data = load_data(filepath); % int16 vector

status = fseek(fid,(i-1)*2,'bof');

if status == -1

error('fseek failed')

end

t1 = datetime;

fwrite(fid, data, 'int16', 2*(n-1));

t2 = datetime;

t3 = duration(t2-t1) % why this is slower in second iteration and onwards?

end

fclose(fid);

t3 = duration

00:00:00.2810

t3 = duration

00:01:37.2600

t3 = duration

00:01:29.9700

t3 = duration

00:01:29.6490

90 (sec) / 0.281 (sec) == ~320 (times)

A simplified code for testing

The longer the vectors are the slower the second and third iterations become, suggesting the slowness has something to do with data before reaching the end of file rather than adding extra bytes at the end of the file.

len = 1000;

i1 = ones(len,1,'int16');

i2 = ones(len,1,'int16').*2;

i3 = ones(len,1,'int16').*3;

fid = fopen('temp.bin','w')

t1= datetime;

fwrite(fid,i1,'int16',2*2);

t2 = datetime;

t3(1,1) = duration(t2-t1);

t1= datetime;

fseek(fid,2, 'bof')

fwrite(fid,i2,'int16',2*2);

t2 = datetime;

t3(2,1) = duration(t2-t1);

t1= datetime;

fseek(fid,2*2, 'bof')

fwrite(fid,i3,'int16',2*2);

t2 = datetime;

t3(3,1) = duration(t2-t1);

fclose(fid);

t3.Format = 'dd:hh:mm:ss.SSSS'

len = 1000

t3 = 3×1 duration array

00:00:00.0070

00:00:00.0140

00:00:00.0190

2~3 times slower

len = 10000

t3 = 3×1 duration array

00:00:00.0060

00:00:00.0850

00:00:00.1050

14~18 times slower

len = 100000

t3 = 3×1 duration array

00:00:00.0280

00:00:00.9820

00:00:00.9080

35 times slower

Guillaume
on 22 Jul 2019

I would suspect the reason for the much slower writes in iteration 2 and onward is that for the first iteration, since the file is new, you're just writing (your data, and 0s in between). From iteration 2, matlab must first read the relevant part of the file, insert the new content and rewrite that.

If you can't fit the whole interlaced data in memory in one go, then you should read the input data in chunk, interlace it in memory, then write it as one continuous block. Repeat for the next chunk of data.

Walter Roberson
on 22 Jul 2019

Edited: Walter Roberson
on 22 Jul 2019

