Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
efficiently fwrite a cell-array of matrices

Subject: efficiently fwrite a cell-array of matrices

From: David

Date: 21 Jan, 2013 15:54:08

Message: 1 of 9

Hi everyone,

I have a large amount of data, that comes as a 1xN-cell-array containing 2D-matrices. The size of the matrices may change arbitrarily, eg:

A = {[1,3;2,4] [] [5 6 7] [8;9;10]};

Also, a Nx2-matrix S containing the sizes is readily available (comes at no costs):
S = [2 2; 0 0; 1 3; 3 1];

I want to write out the contents of A to a binary file in an *efficient* way. I.e., the file should just contain the data in binary form, as it would be after calling fwrite for each cell-entry: 1 2 3 4 5 6 7 8 9 10.

I had some thoughts how to solve this:

(1) For a cell-array just containing row/column-vectors, I found a simple and efficient way using horzcat/vertcat. Example for row-vectors:

B = {[1 2 3], [], [4], [5 6 8 9]};
fwrite(fid, horzcat(B{:});

However, horzcat (and also vertcat) can not be applied to A, of course. [doing a cell2mat and then fwrite has similar efficiency, but the same problem: cannot be applied to A]

(2) Just iterating with "for" over all N cell entries and doing a fwrite per entry lacks efficiency.

(3) I was also thinking of transforming A into a cell-array of rowvectors (A2 = {[1, 2, 3, 4] [] [5, 6, 7] [8, 9, 10]}) and then doing the same I did with B.

But, I also could not find an efficient way to do this transformation. I had to use cellfun with a self-created function that does nothing more but C(:) for a given matrix C (such a function seems not to exist in MATLAB?!). Also, too slow (but already somewhat faster then calling fwrite N times).

(4) For data, where the sizes changes only a few times along the N entries, I already did an efficient, rowwise run-length-encoding on S to find blocks of identical-sized-matrices. Then, I can do horzcat (like for B, see above) on each of the subblocks of cell-data of non-chaning size. However, this brings no advantage for data, where the size changes very often, e.g. when randomly jumping between 1x1 and 2x2.

Do you have any ideas or hints for the most tricky case, where sizes change randomly and very often?

Thank you very much, David.

Subject: efficiently fwrite a cell-array of matrices

From: Bruno Luong

Date: 21 Jan, 2013 20:46:08

Message: 2 of 9

I think the mex file is a way to go.

Bruno

Subject: efficiently fwrite a cell-array of matrices

From: Jan Simon

Date: 21 Jan, 2013 21:20:08

Message: 3 of 9

Hi David,

> (2) Just iterating with "for" over all N cell entries and doing a fwrite per entry lacks efficiency.

Maybe you have implemented this efficiently, maybe you haven't. For a discussion about efficient code posting the original code is obligatory.

  A = {[1,3;2,4] [] [5 6 7] [8;9;10]};
  fid = fopen(FileName, 'w');
  if fid == -1, error('Cannot open file for writing.'); end
  for iC = 1:numel(C)
    fwrite(fid, C{iC});
  end

This is too slow?

Kind regards, Jan

Subject: efficiently fwrite a cell-array of matrices

From: David

Date: 22 Jan, 2013 10:31:08

Message: 4 of 9

Hi Jan and Bruno, hi everyone,

thanks for your answers so far!

After thinking for one more day I found some solution to my problem, which is more efficient than the for-looping (see below) as long as the matrices in A are "small" (up to ~12x12).
I will post my solution here:

My idea sounds complicated first, but turns out to be efficient for small matrix-sizes:
1) initialize a zeros-output array "outblock" with the same number of elements, that A has (for my example: 10)
2) find unique sizes of the matrices in A and the corresponding index-mapping (using "unique"), and then for each unique size:

3a) build up an index-set for the output elements for the entries in A with the same size using an efficient runLengthDecoding (you could use "rude" by "us" for that)
3b) copy all data from all matrices of same size from A to the correct output-positions in "outblock"

4) write out the ouput-datablock to the file (so, overall, only fwrite-call is required.


here is the code for that function:
============================================
function fwriteCell( fid, C, dataType )

if nargin<3
    dataType = class(C{1});
end;

[heights,widths] = cellfun(@size,C);
sizeHW = [heights',widths'];

[uniqueSizes,~,map] = unique(sizeHW,'rows');

uniqueNumel = prod(uniqueSizes,2);
totalNumel = sum(uniqueNumel(map));

numberOfDifferentSizes = length(uniqueNumel);
outind = runLengthDecoding(uniqueNumel(map),map); %you may use "rude" for this

outblock = zeros(totalNumel,1);

for i=1:numberOfDifferentSizes
    tmp = C(map==i);
    tmp = horzcat(tmp{:});
    tmp = reshape(tmp,1,numel(tmp));
    outblock(outind==i)=tmp; % this need most of the runtime for large matrices
end;

fwrite(fid,outblock,dataType);
end
============================================

additionally, here is code for evaluating the efficiency, compared to the naive for-looping:

=================================
%create the array for testing purposes
N=100000;
C = cell(1,N);
for i=1:N
    C{i} = randi(1000,randi(3),randi(3)-1);
end;

fid = fopen('myfile.bin','w');

% variant 1: the complex version using runlength-decoding
tic
fwriteCell(fid,C);
toc

%variant 2: for-loop
tic
for i=1:N
    fwrite(fid,C{i});
end;
toc

fclose(fid);
=================================

for me, the result is:
Elapsed time is 0.202902 seconds.
Elapsed time is 1.352630 seconds.

so, for small matrices, as in this example (1x0 - 3x2), fwriteCell() outperforms the naive implementation by a factor of almost 7, BUT: the larger the matrices in A get, the closer the runtimes will get, and eventually (~12x12-matrices and larger), the naive version will outperform fwriteCell()! For my application, this is sufficient for now, as matrices are small

Maybe somebody has another idea or a hint how I might improve my solution. The row "outblock(outind==i)=tmp;" almost 100% of that functions runtime for large matrices and large N, probably due to the (stupid) copying of data in the memory... I'm not shure if I can improve this in any way...

best regards, David.

Subject: efficiently fwrite a cell-array of matrices

From: Bruno Luong

Date: 22 Jan, 2013 11:02:08

Message: 5 of 9

If you don't want mex, here is what I propose:

fid = fopen('myfile.bin','w');
tic
n = cellfun('prodofsize', C);
stop = cumsum(n);
start = [0 stop]+1;
a = zeros([1, stop(end)], class(C{1}));
for i=1:N
    a(start(i):stop(i)) = C{i}(:);
end;
fwrite(fid,a);
toc
fclose(fid);

It's about 100 faster than the for-loop with your test.

Bruno

Subject: efficiently fwrite a cell-array of matrices

From: David

Date: 22 Jan, 2013 21:57:10

Message: 6 of 9

Bruno, Thank You very much for your implementation! It is indeed the most efficient way (so far) for medium-sized matrices contained in the cell C:

I now tested the efficiency for the three different implementations. The runtime-differences of the three implementations depend on the sizes of the matrices included in C. The results in short:

(1) For small matrices with up to ~50 Elements on average (e.g. a 7x7-matrix), the runlength version I presented still appears to be fastest.

(2) For matrices with 50...5000 elements, Brunos implementation is fastest.

(3) For larger matrices (>5000 elements), the simple for-loop over fwrite(C{i}); performs best.

So, it seems that none of the three solutions is really universally best. The differences are considerable; using solution (3) instead of (1) for small matrices can reduce performance by a factor of 20 - and the same holds vice versa: using (1) instead of (3) for very large matrices!

For now, I am very satisfied using the combination of all three solutions in my function, depending on the (average) matrix size. If there are no more ideas for improvements coming up, I will publish the implementation ("fwriteCell.m") on FE soon.


Best regards, David.

Subject: efficiently fwrite a cell-array of matrices

From: David

Date: 22 Jan, 2013 22:12:10

Message: 7 of 9

Oh, to be more correct, I should add this: The thresholds 50 and 5000 are just a rough rule of thumb, which I gained from testing with matrices of similar sizes (e.g. only between 6x6 and 8x8). The reason why I did it this way is that it better fit to my application.
Having a larger number of different matrix sizes (e.g. between 0x0 and 10x10), Brunos implementation (2) might also be better for an avg number of matrix-elements down to ~20. (The reason is, that the for-loop of my runlength-deconding-version then has a higher number of loops).

best regards, David.

Subject: efficiently fwrite a cell-array of matrices

From: Jan Simon

Date: 30 Jan, 2013 10:57:08

Message: 8 of 9

Dear David,

If Bruno's cell concatenation is useful, you could try the C-Mex version also:
  http://www.mathworks.com/matlabcentral/fileexchange/28916-cell2vec

Kind regards, Jan

Subject: efficiently fwrite a cell-array of matrices

From: Yair Altman

Date: 18 Apr, 2013 17:11:10

Message: 9 of 9

I would also test save('myFile.mat','C','-v6') - on my system it's faster than the naive for loop, almost as fast as your RLE solution, and much simpler and easier to develop, debug and maintain.

p.s. - using '-v7' results in a smaller file but takes a bit longer to perform. Depending on your I/O speed (e.g., whether you're using SSD) it could also be worth checking.

Yair Altman
http://UndocumentedMatlab.com
 

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us