Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Increasing loading time of large MAT files

Subject: Increasing loading time of large MAT files

From: Thomas Schreiter

Date: 20 Aug, 2010 07:34:12

Message: 1 of 6

Hi,

I want to process large datasets and extract some few information out of them. Running the loop, the first datasets are loaded fast. But after a while, the loading operation takes longer and longer:

For example:
> processMyData
2010-08-19 13:42:44> processing 2009-01-01
2010-08-19 13:42:47> processing 2009-01-02
2010-08-19 13:42:50> processing 2009-01-03
...
2010-08-19 16:17:28> processing 2009-12-27
2010-08-19 16:18:44> processing 2009-12-28
2010-08-19 16:19:57> processing 2009-12-29
2010-08-19 16:21:10> processing 2009-12-30
2010-08-19 16:22:24> processing 2009-12-31

So, the first dataset were loaded withing 3 seconds. Whereas the last datasets were loaded within 70 seconds! The loading time seems to increase super-linearly. A test with two years of data lead to loading times of about 300 seconds.

Each dataset is of size 25 MB and stored in a MAT file. The extracted data are only some KB large. The Profiler revealed that 98% of the time is spent on the load operation.

How can I fix this problem?

Thanks in advance.

Subject: Increasing loading time of large MAT files

From: Steven_Lord

Date: 20 Aug, 2010 14:16:54

Message: 2 of 6



"Thomas Schreiter" <t.schreiter@tudelft.nl> wrote in message
news:i4lb5k$gmc$1@fred.mathworks.com...
> Hi,
>
> I want to process large datasets and extract some few information out of
> them. Running the loop, the first datasets are loaded fast. But after a
> while, the loading operation takes longer and longer:
>
> For example:
>> processMyData
> 2010-08-19 13:42:44> processing 2009-01-01
> 2010-08-19 13:42:47> processing 2009-01-02
> 2010-08-19 13:42:50> processing 2009-01-03
> ...
> 2010-08-19 16:17:28> processing 2009-12-27
> 2010-08-19 16:18:44> processing 2009-12-28
> 2010-08-19 16:19:57> processing 2009-12-29
> 2010-08-19 16:21:10> processing 2009-12-30
> 2010-08-19 16:22:24> processing 2009-12-31
>
> So, the first dataset were loaded withing 3 seconds. Whereas the last
> datasets were loaded within 70 seconds! The loading time seems to increase
> super-linearly. A test with two years of data lead to loading times of
> about 300 seconds.
>
> Each dataset is of size 25 MB and stored in a MAT file. The extracted data
> are only some KB large. The Profiler revealed that 98% of the time is
> spent on the load operation.
>
> How can I fix this problem?

I don't think anyone will be able to answer that question without seeing the
code you created. Perhaps you're growing a matrix inside the loop that you
use to iterate over the files? If so, preallocate and see if that improves
the performance.

Additionally, if you haven't already, you should open your function in the
Editor and check to make sure there are no Code Analyzer warnings.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Increasing loading time of large MAT files

From: Walter Roberson

Date: 20 Aug, 2010 14:23:52

Message: 3 of 6

On 20/08/10 2:34 AM, Thomas Schreiter wrote:

> I want to process large datasets and extract some few information out of
> them. Running the loop, the first datasets are loaded fast. But after a
> while, the loading operation takes longer and longer:

How are you storing the data? If you are storing each of the results of
the loading, are you pre-allocating the storage array?

Subject: Increasing loading time of large MAT files

From: Thomas Schreiter

Date: 20 Aug, 2010 21:49:04

Message: 4 of 6

"Steven_Lord" <slord@mathworks.com> wrote in message <i4m2om$nun$1@fred.mathworks.com>...
>
>
> "Thomas Schreiter" <t.schreiter@tudelft.nl> wrote in message
> news:i4lb5k$gmc$1@fred.mathworks.com...
> > Hi,
> >
> > I want to process large datasets and extract some few information out of
> > them. Running the loop, the first datasets are loaded fast. But after a
> > while, the loading operation takes longer and longer:
> >
> > For example:
> >> processMyData
> > 2010-08-19 13:42:44> processing 2009-01-01
> > 2010-08-19 13:42:47> processing 2009-01-02
> > 2010-08-19 13:42:50> processing 2009-01-03
> > ...
> > 2010-08-19 16:17:28> processing 2009-12-27
> > 2010-08-19 16:18:44> processing 2009-12-28
> > 2010-08-19 16:19:57> processing 2009-12-29
> > 2010-08-19 16:21:10> processing 2009-12-30
> > 2010-08-19 16:22:24> processing 2009-12-31
> >
> > So, the first dataset were loaded withing 3 seconds. Whereas the last
> > datasets were loaded within 70 seconds! The loading time seems to increase
> > super-linearly. A test with two years of data lead to loading times of
> > about 300 seconds.
> >
> > Each dataset is of size 25 MB and stored in a MAT file. The extracted data
> > are only some KB large. The Profiler revealed that 98% of the time is
> > spent on the load operation.
> >
> > How can I fix this problem?
>
> I don't think anyone will be able to answer that question without seeing the
> code you created. Perhaps you're growing a matrix inside the loop that you
> use to iterate over the files? If so, preallocate and see if that improves
> the performance.
>
> Additionally, if you haven't already, you should open your function in the
> Editor and check to make sure there are no Code Analyzer warnings.
>
> --
> Steve Lord
> slord@mathworks.com
> comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
> To contact Technical Support use the Contact Us link on
> http://www.mathworks.com

Hi again,

I modified the code to make the case more clear. The code is now reduced to the loading operation itself, so there is no processing of data:

startDate = datenum([2008 01 01 0 0 0]);
nDays = 2*365;
for a = 1:nDays
    % display current time and dataset date
    aDay = startDate + a - 1;
    disp([datestr(now,31) '> processing day ' datestr(aDay, 26)]);

    % load operation
    filename = ['RLDATA_A15_' datestr(aDay, 'yyyymmdd') '_R_15'];
    load(filename);
end

Again, the output shows that in the first iterations, the files are loaded within 3 seconds; the loading time increases with every iteration; at the end, the loading takes 30 seconds. The code was executed on a freshly started Matlab on an idle laptop:

2010-08-20 21:28:51> processing day 2008/01/01
2010-08-20 21:28:54> processing day 2008/01/02
2010-08-20 21:28:57> processing day 2008/01/03
...
2010-08-20 23:21:10> processing day 2009/12/28
2010-08-20 23:21:39> processing day 2009/12/29
2010-08-20 23:22:08> processing day 2009/12/30

Why is the time increasing? And how can the loading times be fixed to a constant time?

Subject: Increasing loading time of large MAT files

From: Walter Roberson

Date: 20 Aug, 2010 22:02:37

Message: 5 of 6

On 10-08-20 04:49 PM, Thomas Schreiter wrote:

> I modified the code to make the case more clear. The code is now reduced
> to the loading operation itself, so there is no processing of data:
>
> startDate = datenum([2008 01 01 0 0 0]); nDays = 2*365;
> for a = 1:nDays
> % display current time and dataset date
> aDay = startDate + a - 1;
> disp([datestr(now,31) '> processing day ' datestr(aDay, 26)]);
>
> % load operation
> filename = ['RLDATA_A15_' datestr(aDay, 'yyyymmdd') '_R_15'];
> load(filename);
> end
>
> Again, the output shows that in the first iterations, the files are
> loaded within 3 seconds; the loading time increases with every
> iteration; at the end, the loading takes 30 seconds.

When you load each file, is it the same variable name each time?

I'd be interested in seeing how the timings change if you were to try
something like,

myvars = struct([]);

for ...
   myvars.(filename) = load(filename);
end


and also,

myvars = cell(nDays,1);
for ...
   myvars{a} = load(filename);
end


By the way, with your current code, there is a minor optimization available:

for aDay = startDate : startDate + nDays - 1
....
end

That is with the code as-is you do not need "a" itself (but you would for the
cell version.)

Subject: Increasing loading time of large MAT files

From: Thomas Schreiter

Date: 21 Aug, 2010 21:29:05

Message: 6 of 6

Walter Roberson <roberson@hushmail.com> wrote in message <i4mu3f$9u1$1@canopus.cc.umanitoba.ca>...
> On 10-08-20 04:49 PM, Thomas Schreiter wrote:
>
> > I modified the code to make the case more clear. The code is now reduced
> > to the loading operation itself, so there is no processing of data:
> >
> > startDate = datenum([2008 01 01 0 0 0]); nDays = 2*365;
> > for a = 1:nDays
> > % display current time and dataset date
> > aDay = startDate + a - 1;
> > disp([datestr(now,31) '> processing day ' datestr(aDay, 26)]);
> >
> > % load operation
> > filename = ['RLDATA_A15_' datestr(aDay, 'yyyymmdd') '_R_15'];
> > load(filename);
> > end
> >
> > Again, the output shows that in the first iterations, the files are
> > loaded within 3 seconds; the loading time increases with every
> > iteration; at the end, the loading takes 30 seconds.
>
> When you load each file, is it the same variable name each time?
>
> I'd be interested in seeing how the timings change if you were to try
> something like,
>
> myvars = struct([]);
>
> for ...
> myvars.(filename) = load(filename);
> end
>
>
> and also,
>
> myvars = cell(nDays,1);
> for ...
> myvars{a} = load(filename);
> end
>
>
> By the way, with your current code, there is a minor optimization available:
>
> for aDay = startDate : startDate + nDays - 1
> ....
> end
>
> That is with the code as-is you do not need "a" itself (but you would for the
> cell version.)

If I get you right, then the stored data would be saved in a growing structure. There are several problems with that:
(1) A lot of memory would be occupied. It's size increases linearly in the number of datasets. So, we would run quickly into memory problems.
(2) Without preallocation, in every iteration a new memory block has to be allocated. That, too, takes a lot of time.

So, I don't want to try it, because the process would crash after a few iterations.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us