Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Appending to existing matfile

Subject: Appending to existing matfile

From: Qizhu

Date: 14 Aug, 2014 10:17:09

Message: 1 of 5

Hi folks,

I work with huge dataset and I have some problem with the saving of variables.

In my programme, I have a for loop which generates a moderately large matrix (around 2k by 15k) in every loop. Ultimately I need to put all the generated matrices together into a huge matrix and save it as a v7.3 mat file so I can partially read/modify the data later on. However, because the for loop executes for as many as a few hundred times, I can't hold all the generated data in RAM. Therefore, what I am trying to do is to save them in each loop and clear the variable immediately.

Following this line of thought, I have tried using
save('filename.mat','variable','-append','-v7.3');
However, as my variable name is the same in each loop, I discovered that this command actually overwrites rather than appends.

Then I thought of trying to dynamically create variables with different names in each loop and append them into the mat file using evalc function. (which I know is a crime in matlab...) That worked. But, the problem now is how to tidy all of those into a big numerical array. I definitely can't open the entire mat file due to memory constraints.

Thank you so much in advance for your time and help.

Subject: Appending to existing matfile

From: Steven Lord

Date: 14 Aug, 2014 14:32:26

Message: 2 of 5


"Qizhu " <qizhu.li@magd.ox.ac.uk> wrote in message
news:lsi2b5$oh0$1@newscl01ah.mathworks.com...
> Hi folks,
>
> I work with huge dataset and I have some problem with the saving of
> variables.
> In my programme, I have a for loop which generates a moderately large
> matrix (around 2k by 15k) in every loop. Ultimately I need to put all the
> generated matrices together into a huge matrix and save it as a v7.3 mat
> file so I can partially read/modify the data later on. However, because
> the for loop executes for as many as a few hundred times, I can't hold all
> the generated data in RAM. Therefore, what I am trying to do is to save
> them in each loop and clear the variable immediately.
> Following this line of thought, I have tried using
> save('filename.mat','variable','-append','-v7.3');
> However, as my variable name is the same in each loop, I discovered that
> this command actually overwrites rather than appends.

That is correct. The -append flag allows you to add _new variables_ to an
existing MAT-file; it does not allow you to _update already existing
variables_ by concatenating extra data at the end.

> Then I thought of trying to dynamically create variables with different
> names in each loop and append them into the mat file using evalc function.
> (which I know is a crime in matlab...) That worked. But, the problem now
> is how to tidy all of those into a big numerical array. I definitely can't
> open the entire mat file due to memory constraints.

Take a look at the MATFILE function.

http://www.mathworks.com/help/matlab/ref/matfile.html

Alternately, SAVE each piece to a separate MAT-file (since if you can't
create the matrix in memory in the first place to SAVE it, you're unlikely
to be able to LOAD the whole thing at once as well.)

There is one more option depending on what you want to DO with this huge
matrix. If you're trying to use it to solve a system of equations, look at
the iterative solvers.

http://www.mathworks.com/help/matlab/linear-equations-iterative-methods.html

Most or all of those solvers can accept either a coefficient matrix or a
function handle that accepts a vector and returns the product of your
coefficient matrix and that vector (possibly with some transposing
involved.) If you can compute that product WITHOUT creating the whole
coefficient matrix in memory at once that may save you memory. For example,
look at the coefficient matrix in the first example on the documentation
page for GMRES; because of the pattern followed by the main diagonal, the
second example can compute the same results as multiplying that coefficient
matrix with x _without actually creating the coefficient matrix_, instead
just creating a few vectors. For this small a problem, the memory savings is
also small; if n were 2100 or 21000 instead of 21 it could make the
difference between the program running and the program throwing an Out of
Memory error.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Appending to existing matfile

From: Qizhu

Date: 14 Aug, 2014 15:08:06

Message: 3 of 5

"Steven Lord" <Steven_Lord@mathworks.com> wrote in message <lsihao$5eh$1@newscl01ah.mathworks.com>...
>
> "Qizhu " <qizhu.li@magd.ox.ac.uk> wrote in message
> news:lsi2b5$oh0$1@newscl01ah.mathworks.com...
> > Hi folks,
> >
> > I work with huge dataset and I have some problem with the saving of
> > variables.
> > In my programme, I have a for loop which generates a moderately large
> > matrix (around 2k by 15k) in every loop. Ultimately I need to put all the
> > generated matrices together into a huge matrix and save it as a v7.3 mat
> > file so I can partially read/modify the data later on. However, because
> > the for loop executes for as many as a few hundred times, I can't hold all
> > the generated data in RAM. Therefore, what I am trying to do is to save
> > them in each loop and clear the variable immediately.
> > Following this line of thought, I have tried using
> > save('filename.mat','variable','-append','-v7.3');
> > However, as my variable name is the same in each loop, I discovered that
> > this command actually overwrites rather than appends.
>
> That is correct. The -append flag allows you to add _new variables_ to an
> existing MAT-file; it does not allow you to _update already existing
> variables_ by concatenating extra data at the end.
>
> > Then I thought of trying to dynamically create variables with different
> > names in each loop and append them into the mat file using evalc function.
> > (which I know is a crime in matlab...) That worked. But, the problem now
> > is how to tidy all of those into a big numerical array. I definitely can't
> > open the entire mat file due to memory constraints.
>
> Take a look at the MATFILE function.
>
> http://www.mathworks.com/help/matlab/ref/matfile.html
>
> Alternately, SAVE each piece to a separate MAT-file (since if you can't
> create the matrix in memory in the first place to SAVE it, you're unlikely
> to be able to LOAD the whole thing at once as well.)
>
> There is one more option depending on what you want to DO with this huge
> matrix. If you're trying to use it to solve a system of equations, look at
> the iterative solvers.
>
> http://www.mathworks.com/help/matlab/linear-equations-iterative-methods.html
>
> Most or all of those solvers can accept either a coefficient matrix or a
> function handle that accepts a vector and returns the product of your
> coefficient matrix and that vector (possibly with some transposing
> involved.) If you can compute that product WITHOUT creating the whole
> coefficient matrix in memory at once that may save you memory. For example,
> look at the coefficient matrix in the first example on the documentation
> page for GMRES; because of the pattern followed by the main diagonal, the
> second example can compute the same results as multiplying that coefficient
> matrix with x _without actually creating the coefficient matrix_, instead
> just creating a few vectors. For this small a problem, the memory savings is
> also small; if n were 2100 or 21000 instead of 21 it could make the
> difference between the program running and the program throwing an Out of
> Memory error.
>
> --
> Steve Lord
> slord@mathworks.com
> To contact Technical Support use the Contact Us link on
> http://www.mathworks.com

Thanks Steve for the reply! Yes I am aware of the matfile command. In fact that's what I am planning to use after I manage (if ever..) to create this huge matrix, so that I can load just part of the matrix each time.

What I am trying to do with this matrix to perform probabilistic PCA. In fact the for loop I described previously is just the first step: to calculate the mean along each row of the raw data matrix (which comes in uint8 format), subtract the mean off each entry, and save the centred data as a new mat file with single precision. Because of the huge dimension of the raw data matrix (130k by 200k ) , I can't do the simply one-liner: mean(data,2). I have to read in a few rows, find the means, centre them, save, and then process the next few rows.

Your answer seem to suggest that it is not possible to append to existing variables by concatenating extra matrix at the end. Did I get this right?

Also, is it possible to create an EMPTY single precision v7.3 mat file on hard disc without creating it in RAM, so that I can use the matfile function to partially load it and save my centred matrix portion into the right position in this matrix?

Subject: Appending to existing matfile

From: Steven Lord

Date: 15 Aug, 2014 14:00:35

Message: 4 of 5


"Qizhu " <qizhu.li@magd.ox.ac.uk> wrote in message
news:lsijcm$bdh$1@newscl01ah.mathworks.com...
> "Steven Lord" <Steven_Lord@mathworks.com> wrote in message
> <lsihao$5eh$1@newscl01ah.mathworks.com>...

*snip*

> Thanks Steve for the reply! Yes I am aware of the matfile command. In fact
> that's what I am planning to use after I manage (if ever..) to create this
> huge matrix, so that I can load just part of the matrix each time.
> What I am trying to do with this matrix to perform probabilistic PCA. In
> fact the for loop I described previously is just the first step: to
> calculate the mean along each row of the raw data matrix (which comes in
> uint8 format), subtract the mean off each entry, and save the centred data
> as a new mat file with single precision. Because of the huge dimension of
> the raw data matrix (130k by 200k ) , I can't do the simply one-liner:
> mean(data,2). I have to read in a few rows, find the means, centre them,
> save, and then process the next few rows.

If you have a parallel cluster of machines, creating that large matrix as a
distributed array may help you. In that case you may in fact be able to use
that simple one-liner, letting Parallel Computing Toolbox handle performing
the computations in a distributed manner.

http://www.mathworks.com/help/distcomp/distributed-arrays-and-spmd.html

> Your answer seem to suggest that it is not possible to append to existing
> variables by concatenating extra matrix at the end. Did I get this right?

Not using "save -append", no. In memory, yes. Via a MATFILE object, not by
concatenation but by indexed assignment.

> Also, is it possible to create an EMPTY single precision v7.3 mat file on
> hard disc without creating it in RAM, so that I can use the matfile
> function to partially load it and save my centred matrix portion into the
> right position in this matrix?

Answering the question you meant to ask, you can create a single precision
in the MAT-file without running into memory problems, then expand it via a
MATFILE object. You _would_ need to create single precision scalars in
memory, but if those cause memory problems lots of other programs on your
machine are probably in trouble as well as MATLAB.


>> m = matfile('expandingSingleMatrix.mat', 'Writable', true);
>> m.s = single(1);
>> whos -file expandingSingleMatrix.mat
  Name Size Bytes Class Attributes

  s 1x1 4 single

>> m.s(1000, 1000) = single(pi);
>> whos -file expandingSingleMatrix.mat
  Name Size Bytes Class Attributes

  s 1000x1000 4000000 single


After I created the variable s, it was scalar. When I expanded it by
assigning a value to an element that didn't already exist, it grew.

Answering the question you _actually_ asked, creating an empty single
precision array will not cause ANY problem with memory. By definition, an
empty array has at least one dimension of size 0, meaning it has no elements
and takes up no memory.


>> tic; z = ones(1e14, 1e14, 0, 'single'); toc
Elapsed time is 0.000228 seconds.
>> whos z
  Name Size Bytes Class
Attributes

  z 100000000000000x100000000000000x0 0 single


Now if you were to try to expand THIS matrix, you WOULD have a memory
problem.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Appending to existing matfile

From: Qizhu

Date: 16 Aug, 2014 01:36:06

Message: 5 of 5

> Not using "save -append", no. In memory, yes. Via a MATFILE object, not by
> concatenation but by indexed assignment.
>
> > Also, is it possible to create an EMPTY single precision v7.3 mat file on
> > hard disc without creating it in RAM, so that I can use the matfile
> > function to partially load it and save my centred matrix portion into the
> > right position in this matrix?
>
> Answering the question you meant to ask, you can create a single precision
> in the MAT-file without running into memory problems, then expand it via a
> MATFILE object. You _would_ need to create single precision scalars in
> memory, but if those cause memory problems lots of other programs on your
> machine are probably in trouble as well as MATLAB.
>
>
> >> m = matfile('expandingSingleMatrix.mat', 'Writable', true);
> >> m.s = single(1);
> >> whos -file expandingSingleMatrix.mat
> Name Size Bytes Class Attributes
>
> s 1x1 4 single
>
> >> m.s(1000, 1000) = single(pi);
> >> whos -file expandingSingleMatrix.mat
> Name Size Bytes Class Attributes
>
> s 1000x1000 4000000 single
>
>
> After I created the variable s, it was scalar. When I expanded it by
> assigning a value to an element that didn't already exist, it grew.
>

That's really helpful. This bit solved exactly my problem. Thanks so much mate!

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us