Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
how to save time loading data from file

Subject: how to save time loading data from file

From: ggk

Date: 3 Aug, 2008 22:10:03

Message: 1 of 20

Hi,

I've run a profile on my code and found that 90% of the
time running my .m file is consumed by one line of code:

x = load('data.txt');

I'm using 1GB RAM (Windows XP 3GHz Pentium4). The file
contains ascii data -- one data point (double precision)
per row, and many rows.

Can anyone share their insight how to speed this up?

- Will more RAM help?
- Any other means to load data into matlab that is faster?

If I were to buy a new machine just for speeding this up,
what characteristic(s) should I look for to buy? In other
words, what's limiting the speed here and how to make sure
I get a machine that can speed this up?

Seems like I just need to get the data from the hard drive
into RAM. Would a SAS drive be better than SATA, for
example? Faster DDR memory? etc.

Appreciate any insight here. Best regards, -GGK

Subject: how to save time loading data from file

From: Brian Borchers

Date: 3 Aug, 2008 22:32:22

Message: 2 of 20

On Aug 3, 4:10 pm, "ggk " <ggkm...@comcast.net> wrote:
> Hi,
>
> I've run a profile on my code and found that 90% of the
> time running my .m file is consumed by one line of code:
>
> x = load('data.txt');
>
> I'm using 1GB RAM (Windows XP 3GHz Pentium4). The file
> contains ascii data -- one data point (double precision)
> per row, and many rows.
>
> Can anyone share their insight how to speed this up?
>
> - Will more RAM help?
> - Any other means to load data into matlab that is faster?
>
> If I were to buy a new machine just for speeding this up,
> what characteristic(s) should I look for to buy? In other
> words, what's limiting the speed here and how to make sure
> I get a machine that can speed this up?
>
> Seems like I just need to get the data from the hard drive
> into RAM. Would a SAS drive be better than SATA, for
> example? Faster DDR memory? etc.
>
> Appreciate any insight here. Best regards, -GGK

In my experience, .mat files load much faster than text files, which
suggests that the problem has to do with the time consuming task of
converting numbers in text format into IEEE floating point.

Is this a data file that doesn't change from one run of the program to
the next, or is it a new data file each time?

If the data file isn't changing, then you could simply load it in once
and then save it as a .mat file. The resulting file would typically
be somewhat smaller than the original text file (assuming that the
numbers in data.txt have 8 characters or more per number) and load
much faster (because there would be no need to convert the numbers to
floating point and because the file would be smaller.)

If the data file is new each time you run the program, then you could
work your way upstream to the program that produces the file and
modify it to save the data as a .mat file rather than a text file.

Subject: how to save time loading data from file

From: ggk

Date: 3 Aug, 2008 23:29:02

Message: 3 of 20

Thanks, the file is only loaded once and used.
Unfortunately I have many such files (all different data)
I need to process. This data comes from an oscilloscope
that output in ascii format (I'll check but don't believe
it can export data in binary).

Subject: how to save time loading data from file

From: per isakson

Date: 4 Aug, 2008 00:59:02

Message: 4 of 20

"ggk " <ggkmath@comcast.net> wrote in message
<g75evu$eg$1@fred.mathworks.com>...
> Thanks, the file is only loaded once and used.
> Unfortunately I have many such files (all different data)
> I need to process. This data comes from an oscilloscope
> that output in ascii format (I'll check but don't believe
> it can export data in binary).

Of some reason the on-line help says:

"Use load -ascii only on files that have been created with
the save -ascii command."

Did you try fscanf?

Over the years Urs Schwarz (us) has shown effective code to
handle ascii-files. However, what was true a few years ago
must not neccessarily be true with R2008a.

/ per

Subject: how to save time loading data from file

From: Paul

Date: 4 Aug, 2008 02:56:01

Message: 5 of 20

"per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message
<g75k8l$ppg$1@fred.mathworks.com>...
> "ggk " <ggkmath@comcast.net> wrote in message
> <g75evu$eg$1@fred.mathworks.com>...
> > Thanks, the file is only loaded once and used.
> > Unfortunately I have many such files (all different data)
> > I need to process. This data comes from an oscilloscope
> > that output in ascii format (I'll check but don't believe
> > it can export data in binary).
>
> Of some reason the on-line help says:
>
> "Use load -ascii only on files that have been created with
> the save -ascii command."
>
> Did you try fscanf?
>
> Over the years Urs Schwarz (us) has shown effective code to
> handle ascii-files. However, what was true a few years ago
> must not neccessarily be true with R2008a.
>
> / per
>

I would consider a pre-processor (e.g. fortran code) to read
the ascii file and convert it to binary for Matlab. The
pre-processor could be an executable that could be called
from Matlab via a system call.

Subject: how to save time loading data from file

From: ggk

Date: 4 Aug, 2008 03:51:01

Message: 6 of 20

Thanks Per, The fscanf routine works twice as fast as load
('filename'). Unfortunately I need an order of magnitude
improvement, at least.

Paul, how would I go about finding a pre-processor? Would
any share-ware program that converts ascii to binary work?
Or does Matlab use a 'unique' version of binary code? (are
all binary files equal?). Something I could automate
within Matlab would be perfect. Looking for any ideas...

Subject: how to save time loading data from file

From: Andres

Date: 4 Aug, 2008 13:20:03

Message: 7 of 20

"ggk " <ggkmath@comcast.net> wrote in message <g75ub5$k64
$1@fred.mathworks.com>...
> Thanks Per, The fscanf routine works twice as fast as load
> ('filename'). Unfortunately I need an order of magnitude
> improvement, at least.
>
> [..]


some of my experience (R2008a):

% ========================

% generate a test file of ~15MB:

A = rand(1,1)+reshape(linspace(0,100,1e6),1e6/20,20);

filename= 'C:\test.txt';

save(filename,'A','-ascii');

% >6s ---------------------

T = textread(filename);

C = importdata(filename);

L = load(filename);
% no effect of the '-ascii' option

% ~2s ---------------------

M = dlmread(filename);

% <=1s --------------------

B = txt2mat(filename,0,20,'Infolevel',0);
% file exchange, based on fread & sscanf

tic, fid = fopen(filename);
D = textscan(fid, repmat('%f ',1,20), 'CollectOutput',true);
D = D{1};
fclose(fid); toc

% ========================


The approximate times given are measured by repeated runs,
so there'll be some notable effect of caching the import
function and the test files. textscan is the fasted one, I
wonder if it beats your fscanf solution.

regards
Andres

Subject: how to save time loading data from file

From: Steven Lord

Date: 4 Aug, 2008 14:15:14

Message: 8 of 20


"ggk " <ggkmath@comcast.net> wrote in message
news:g75abr$40d$1@fred.mathworks.com...
> Hi,
>
> I've run a profile on my code and found that 90% of the
> time running my .m file is consumed by one line of code:
>
> x = load('data.txt');
>
> I'm using 1GB RAM (Windows XP 3GHz Pentium4). The file
> contains ascii data -- one data point (double precision)
> per row, and many rows.
>
> Can anyone share their insight how to speed this up?

LOAD needs to "figure out" how to read in your data (i.e. it needs to
determine how the file is formatted) and that takes some time. If you know
the format of your data file (for instance, the fact that each row contains
exactly one double-precision number followed by a newline), make use of that
information with some of the other file I/O functions (like DLMREAD,
CSVREAD, TEXTSCAN, or even the low-level FOPEN/FSCANF/FCLOSE functions.) If
you use a more "focused" reader rather than the general LOAD, you may see
some improvement.

--
Steve Lord
slord@mathworks.com

Subject: how to save time loading data from file

From: Rune Allnor

Date: 4 Aug, 2008 14:49:19

Message: 9 of 20

On 4 Aug, 00:10, "ggk " <ggkm...@comcast.net> wrote:
> Hi,
>
> I've run a profile on my code and found that 90% of the
> time running my .m file is consumed by one line of code:
>
> x = load('data.txt');
>
> I'm using 1GB RAM (Windows XP 3GHz Pentium4). The file
> contains ascii data -- one data point (double precision)
> per row, and many rows.

How many rows?

> Can anyone share their insight how to speed this up?
>
> - Will more RAM help?

No. The time-consuming step is to convert data from
ASCII to binary. How you read the file from disk is
insignificant in comparision.

> - Any other means to load data into matlab that is faster?

The only thing that will speed things *significantly*
is to store the data on binary format. Not too long ago
I sped up the loading of 90 MB of ASCII files from ~45 s
to ~0.2 s by changing the storage format to binary. Not
only were the loading some 250x faster, I also saved some
20% disk space by storing the data on binary format.

Rune

Subject: how to save time loading data from file

From: ggk

Date: 4 Aug, 2008 22:10:18

Message: 10 of 20

> The only thing that will speed things *significantly*
> is to store the data on binary format. Not too long ago
> I sped up the loading of 90 MB of ASCII files from ~45 s
> to ~0.2 s by changing the storage format to binary. Not
> only were the loading some 250x faster, I also saved some
> 20% disk space by storing the data on binary format.
>
> Rune

Fortunately the one thing I know exactly for my
application is the type of data, formatting and total #
rows per ascii file. My file size is 1.3GB containing one
double-precision number per row and 100M rows. Because I
don't have enough memory to handle the 1.3GB of data, I've
split the ascii file up into 10 files of 10M rows each.

Wow, loading binary files goes 250x faster!? How can I
convert my ascii file to binary? Any routines to do this
that can be called from Matlab?

Subject: how to save time loading data from file

From: per isakson

Date: 4 Aug, 2008 23:54:02

Message: 11 of 20

"ggk " <ggkmath@comcast.net> wrote in message <g77uoa$a48
$1@fred.mathworks.com>...
> > The only thing that will speed things *significantly*
> > is to store the data on binary format. Not too long ago
> > I sped up the loading of 90 MB of ASCII files from ~45 s
> > to ~0.2 s by changing the storage format to binary. Not
> > only were the loading some 250x faster, I also saved
some
> > 20% disk space by storing the data on binary format.
> >
> > Rune
>
> Fortunately the one thing I know exactly for my
> application is the type of data, formatting and total #
> rows per ascii file. My file size is 1.3GB containing one
> double-precision number per row and 100M rows. Because I
> don't have enough memory to handle the 1.3GB of data,
I've
> split the ascii file up into 10 files of 10M rows each.
>
> Wow, loading binary files goes 250x faster!? How can I
> convert my ascii file to binary? Any routines to do this
> that can be called from Matlab?

I have some vague memories from the time when it was
regarded wasteful to read formatted textfiles as "free
text". It was much faster to read using an exact format-
string, eg %10.4f rather than with %f.

Some years ago I was surprised by the results of some
experiments I did with Matlab: "%10.4f" was not faster
than "%f". Conclusion: Matlab doesn't take full advantage
of the information given in the format.

I guess that a carefully written piece of (e.g) fortran in
a MEX-file would solve your problem.

/per

Subject: how to save time loading data from file

From: Paul

Date: 5 Aug, 2008 06:59:02

Message: 12 of 20

"per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message
<g784qq$o9e$1@fred.mathworks.com>...
> "ggk " <ggkmath@comcast.net> wrote in message <g77uoa$a48
> $1@fred.mathworks.com>...
> > > The only thing that will speed things *significantly*
> > > is to store the data on binary format. Not too long ago
> > > I sped up the loading of 90 MB of ASCII files from ~45 s
> > > to ~0.2 s by changing the storage format to binary. Not
> > > only were the loading some 250x faster, I also saved
> some
> > > 20% disk space by storing the data on binary format.
> > >
> > > Rune
> >
> > Fortunately the one thing I know exactly for my
> > application is the type of data, formatting and total #
> > rows per ascii file. My file size is 1.3GB containing one
> > double-precision number per row and 100M rows. Because I
> > don't have enough memory to handle the 1.3GB of data,
> I've
> > split the ascii file up into 10 files of 10M rows each.
> >
> > Wow, loading binary files goes 250x faster!? How can I
> > convert my ascii file to binary? Any routines to do this
> > that can be called from Matlab?
>
> I have some vague memories from the time when it was
> regarded wasteful to read formatted textfiles as "free
> text". It was much faster to read using an exact format-
> string, eg %10.4f rather than with %f.
>
> Some years ago I was surprised by the results of some
> experiments I did with Matlab: "%10.4f" was not faster
> than "%f". Conclusion: Matlab doesn't take full advantage
> of the information given in the format.
>
> I guess that a carefully written piece of (e.g) fortran in
> a MEX-file would solve your problem.
>
> /per
>

I may suggest a simple fortran program that would read the
ascii formatted data and create a new version of the data in
binary format, i.e. one simple loop over all the ascii data
(note: following code is not valid fortran but only a idea)

10 read(7,'ascii',end=20) x
    write(8,'binary')x
    goto 10
20 end


I would call this executable from Matlab and then just open
and read the new binary data with fread that can handle just
about any binary format.

Subject: how to save time loading data from file

From: NZTideMan

Date: 5 Aug, 2008 07:24:08

Message: 13 of 20

On Aug 5, 6:59=A0pm, "Paul " <p...@ceri.memphis.edu> wrote:
> "per isakson" <poi.nos...@bimDOTkthDOT.se> wrote in message
>
> <g784qq$o9...@fred.mathworks.com>...
>
>
>
>
>
> > "ggk " <ggkm...@comcast.net> wrote in message <g77uoa$a48
> > $...@fred.mathworks.com>...
> > > > The only thing that will speed things *significantly*
> > > > is to store the data on binary format. Not too long ago
> > > > I sped up the loading of 90 MB of ASCII files from ~45 s
> > > > to ~0.2 s by changing the storage format to binary. Not
> > > > only were the loading some 250x faster, I also saved
> > some
> > > > 20% disk space by storing the data on binary format.
>
> > > > Rune
>
> > > Fortunately the one thing I know exactly for my
> > > application is the type of data, formatting and total #
> > > rows per ascii file. My file size is 1.3GB containing one
> > > double-precision number per row and 100M rows. Because I
> > > don't have enough memory to handle the 1.3GB of data,
> > I've
> > > split the ascii file up into 10 files of 10M rows each.
>
> > > Wow, loading binary files goes 250x faster!? How can I
> > > convert my ascii file to binary? Any routines to do this
> > > that can be called from Matlab?
>
> > I have some vague memories from the time when it was
> > regarded wasteful to read formatted textfiles as "free
> > text". It was much faster to read using an exact format-
> > string, eg %10.4f rather than with %f. =A0
>
> > Some years ago I was surprised by the results of some
> > experiments I did with Matlab: "%10.4f" was not faster
> > than "%f". Conclusion: Matlab doesn't take full advantage
> > of the information given in the format.
>
> > I guess that a carefully written piece of (e.g) fortran in
> > a MEX-file would solve your problem. =A0 =A0
>
> > /per
>
> I may suggest a simple fortran program that would read the
> ascii formatted data and create a new version of the data in
> binary format, i.e. =A0one simple loop over all the ascii data
> (note: following code is not valid fortran but only a idea)
>
> 10 =A0read(7,'ascii',end=3D20) x
> =A0 =A0 write(8,'binary')x
> =A0 =A0 goto 10
> 20 =A0end
>
> I would call this executable from Matlab and then just open
> and read the new binary data with fread that can handle just
> about any binary format. =A0- Hide quoted text -
>
> - Show quoted text -

Well, that's the sort of spaghetti code that gives Fortran a bad name.
I didn't know anyone wrote such code any more.
Here's a modern version of the same thing:
do
read(7,*,iostat=3Dierr)x
if(ierr .ne. 0)exit
write(8)x
end do

But even better would be to read in several lines of ASCII data and
write out blocks of binary.

Subject: how to save time loading data from file

From: Paul

Date: 5 Aug, 2008 07:43:06

Message: 14 of 20

NZTideMan <mulgor@gmail.com> wrote in message
<eab1c3ca-3e73-4e43-8325-b49adb4dcec0@p31g2000prf.googlegroups.com>...
> On Aug 5, 6:59=A0pm, "Paul " <p...@ceri.memphis.edu> wrote:
> > "per isakson" <poi.nos...@bimDOTkthDOT.se> wrote in message
> >
> > <g784qq$o9...@fred.mathworks.com>...
> >
> >
> >
> >
> >
> > > "ggk " <ggkm...@comcast.net> wrote in message <g77uoa$a48
> > > $...@fred.mathworks.com>...
> > > > > The only thing that will speed things *significantly*
> > > > > is to store the data on binary format. Not too
long ago
> > > > > I sped up the loading of 90 MB of ASCII files from
~45 s
> > > > > to ~0.2 s by changing the storage format to
binary. Not
> > > > > only were the loading some 250x faster, I also saved
> > > some
> > > > > 20% disk space by storing the data on binary format.
> >
> > > > > Rune
> >
> > > > Fortunately the one thing I know exactly for my
> > > > application is the type of data, formatting and total #
> > > > rows per ascii file. My file size is 1.3GB
containing one
> > > > double-precision number per row and 100M rows. Because I
> > > > don't have enough memory to handle the 1.3GB of data,
> > > I've
> > > > split the ascii file up into 10 files of 10M rows each.
> >
> > > > Wow, loading binary files goes 250x faster!? How can I
> > > > convert my ascii file to binary? Any routines to do this
> > > > that can be called from Matlab?
> >
> > > I have some vague memories from the time when it was
> > > regarded wasteful to read formatted textfiles as "free
> > > text". It was much faster to read using an exact format-
> > > string, eg %10.4f rather than with %f. =A0
> >
> > > Some years ago I was surprised by the results of some
> > > experiments I did with Matlab: "%10.4f" was not faster
> > > than "%f". Conclusion: Matlab doesn't take full advantage
> > > of the information given in the format.
> >
> > > I guess that a carefully written piece of (e.g) fortran in
> > > a MEX-file would solve your problem. =A0 =A0
> >
> > > /per
> >
> > I may suggest a simple fortran program that would read the
> > ascii formatted data and create a new version of the data in
> > binary format, i.e. =A0one simple loop over all the
ascii data
> > (note: following code is not valid fortran but only a idea)
> >
> > 10 =A0read(7,'ascii',end=3D20) x
> > =A0 =A0 write(8,'binary')x
> > =A0 =A0 goto 10
> > 20 =A0end
> >
> > I would call this executable from Matlab and then just open
> > and read the new binary data with fread that can handle just
> > about any binary format. =A0- Hide quoted text -
> >
> > - Show quoted text -
>
> Well, that's the sort of spaghetti code that gives Fortran
a bad name.
> I didn't know anyone wrote such code any more.
> Here's a modern version of the same thing:
> do
> read(7,*,iostat=3Dierr)x
> if(ierr .ne. 0)exit
> write(8)x
> end do
>
> But even better would be to read in several lines of ASCII
data and
> write out blocks of binary.

Hey ... there are some spaghetti hounds around that can
still recall punch card machines, tape setups for plots and
getting your output from a wide slot in a wall!

And I am a card carrying member of the GOTO society of
America since the version of fortran I used did not have an
ENDDO.

Subject: how to save time loading data from file

From: NZTideMan

Date: 5 Aug, 2008 11:19:39

Message: 15 of 20

On Aug 5, 7:43=A0pm, "Paul " <p...@ceri.memphis.edu> wrote:
> NZTideMan <mul...@gmail.com> wrote in message
>
> <eab1c3ca-3e73-4e43-8325-b49adb4dc...@p31g2000prf.googlegroups.com>...
>
>
>
>
>
> > On Aug 5, 6:59=3DA0pm, "Paul " <p...@ceri.memphis.edu> wrote:
> > > "per isakson" <poi.nos...@bimDOTkthDOT.se> wrote in message
>
> > > <g784qq$o9...@fred.mathworks.com>...
>
> > > > "ggk " <ggkm...@comcast.net> wrote in message <g77uoa$a48
> > > > $...@fred.mathworks.com>...
> > > > > > The only thing that will speed things *significantly*
> > > > > > is to store the data on binary format. Not too
> long ago
> > > > > > I sped up the loading of 90 MB of ASCII files from
> ~45 s
> > > > > > to ~0.2 s by changing the storage format to
> binary. Not
> > > > > > only were the loading some 250x faster, I also saved
> > > > some
> > > > > > 20% disk space by storing the data on binary format.
>
> > > > > > Rune
>
> > > > > Fortunately the one thing I know exactly for my
> > > > > application is the type of data, formatting and total #
> > > > > rows per ascii file. My file size is 1.3GB
> containing one
> > > > > double-precision number per row and 100M rows. Because I
> > > > > don't have enough memory to handle the 1.3GB of data,
> > > > I've
> > > > > split the ascii file up into 10 files of 10M rows each.
>
> > > > > Wow, loading binary files goes 250x faster!? How can I
> > > > > convert my ascii file to binary? Any routines to do this
> > > > > that can be called from Matlab?
>
> > > > I have some vague memories from the time when it was
> > > > regarded wasteful to read formatted textfiles as "free
> > > > text". It was much faster to read using an exact format-
> > > > string, eg %10.4f rather than with %f. =3DA0
>
> > > > Some years ago I was surprised by the results of some
> > > > experiments I did with Matlab: "%10.4f" was not faster
> > > > than "%f". Conclusion: Matlab doesn't take full advantage
> > > > of the information given in the format.
>
> > > > I guess that a carefully written piece of (e.g) fortran in
> > > > a MEX-file would solve your problem. =3DA0 =3DA0
>
> > > > /per
>
> > > I may suggest a simple fortran program that would read the
> > > ascii formatted data and create a new version of the data in
> > > binary format, i.e. =3DA0one simple loop over all the
> ascii data
> > > (note: following code is not valid fortran but only a idea)
>
> > > 10 =3DA0read(7,'ascii',end=3D3D20) x
> > > =3DA0 =3DA0 write(8,'binary')x
> > > =3DA0 =3DA0 goto 10
> > > 20 =3DA0end
>
> > > I would call this executable from Matlab and then just open
> > > and read the new binary data with fread that can handle just
> > > about any binary format. =3DA0- Hide quoted text -
>
> > > - Show quoted text -
>
> > Well, that's the sort of spaghetti code that gives Fortran
> a bad name.
> > I didn't know anyone wrote such code any more.
> > Here's a modern version of the same thing:
> > do
> > read(7,*,iostat=3D3Dierr)x
> > if(ierr .ne. 0)exit
> > write(8)x
> > end do
>
> > But even better would be to read in several lines of ASCII
> data and
> > write out blocks of binary.
>
> Hey ... there are some spaghetti hounds around that can
> still recall punch card machines, tape setups for plots and
> getting your output from a wide slot in a wall! =A0
>
> And I am a card carrying member of the GOTO society of
> America since the version of fortran I used did not have an
> ENDDO.- Hide quoted text -
>
> - Show quoted text -

Yes, I was spaghetti hound myself until I had to debug someone else's
spaghetti code one day and it was like my personal road to Damascus.
I vowed never to write code like that again.
And I also remember Hollerith cards and JCL and getting print-plots on
fanfold paper 132 columns wide......

Subject: how to save time loading data from file

From: Rune Allnor

Date: 5 Aug, 2008 15:41:18

Message: 16 of 20

On 5 Aug, 00:10, "ggk " <ggkm...@comcast.net> wrote:
> > The only thing that will speed things *significantly*
> > is to store the data on binary format. Not too long ago
> > I sped up the loading of 90 MB of ASCII files from ~45 s
> > to ~0.2 s by changing the storage format to binary. Not
> > only were the loading some 250x faster, I also saved some
> > 20% disk space by storing the data on binary format.
>
> > Rune
>
> Fortunately the one thing I know exactly for my
> application is the type of data, formatting and total #
> rows per ascii file. My file size is 1.3GB containing one
> double-precision number per row and 100M rows. Because I
> don't have enough memory to handle the 1.3GB of data, I've
> split the ascii file up into 10 files of 10M rows each.
>
> Wow, loading binary files goes 250x faster!? How can I
> convert my ascii file to binary? Any routines to do this
> that can be called from Matlab?

Well, if you can't configure the oscilloscope to store
directly to binary format, the damage has already been
done - once the data are stored to ASCII format one
necessarily needs to convert them back to binary.

In that case, do as was already suggested by somebody
else and read the files *once* from ASCII and store
the data e.g. to .mat files. You can do that from
within matlab. Remember, you don't have to sit idle
in front of the computer while it does its work.
Set up a script which converts the files and start it
before you go home and leave the computer to work
overnight.

If you can get your oscilloscope to store the data to
some 'naive' binary format, FREAD is the function you
will want to use.

Rune

Subject: how to save time loading data from file

From: Rune Allnor

Date: 5 Aug, 2008 17:24:56

Message: 17 of 20

On 5 Aug, 00:10, "ggk " <ggkm...@comcast.net> wrote:
> > The only thing that will speed things *significantly*
> > is to store the data on binary format. Not too long ago
> > I sped up the loading of 90 MB of ASCII files from ~45 s
> > to ~0.2 s by changing the storage format to binary. Not
> > only were the loading some 250x faster, I also saved some
> > 20% disk space by storing the data on binary format. ...

> Wow, loading binary files goes 250x faster!? How can I
> convert my ascii file to binary? Any routines to do this
> that can be called from Matlab?

Below is a script to demonstrates the timing differences
between ASCII and binary data. Output on my screen (R2006a):

Wrote ASCII data in 2.2344 seconds
Read ASCII data in 4.1719 seconds
Wrote binary data in 0.03125 seconds
Read binary data in 0.03125 seconds

File sizes:

test.txt (ASCII) 17579 kB
test.raw (Binary) 7813 kB

Rune

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
N = 1000000;
d1=randn(N,1);
t1=cputime;
save test.txt d1 -ascii
t2=cputime-t1;
disp(['Wrote ASCII data in ',num2str(t2),' seconds'])

t3=cputime;
d2=load('test.txt','-ascii');
t4=cputime-t3;
disp(['Read ASCII data in ',num2str(t4),' seconds'])

t5=cputime;
fid=fopen('test.raw','w');
fwrite(fid,d1,'double');
fclose(fid);
t6=cputime-t5;
disp(['Wrote binary data in ',num2str(t6),' seconds'])

t7=cputime;
fid=fopen('test.raw','r');
d3=fread(fid,'double');
fclose(fid);
t8=cputime-t7;
disp(['Wrote binary data in ',num2str(t8),' seconds'])

Subject: how to save time loading data from file

From: Ivan E. Cao-Berg

Date: 5 Aug, 2008 22:23:01

Message: 18 of 20

I am sorry to say that there is a huge variation from system
to system but in general loading data into Matlab from plain
ASCII files is slow.

I develop software for MacOSX, Windows and Linux using
Matlab and let me say, I feel your pain.

If loading your data takes a lot of time may want to load
your data and then save it as a mat file. Loading mat files
tend to be quicker than loading ASCII but that may depend on
the complexity of the data and the size of you file.

Ivan

"ggk " <ggkmath@comcast.net> wrote in message
<g75evu$eg$1@fred.mathworks.com>...
> Thanks, the file is only loaded once and used.
> Unfortunately I have many such files (all different data)
> I need to process. This data comes from an oscilloscope
> that output in ascii format (I'll check but don't believe
> it can export data in binary).

Subject: how to save time loading data from file

From: ggk

Date: 6 Aug, 2008 03:04:02

Message: 19 of 20

Thanks everyone for your replies and humor above. I've
tried textscan and it's twice as fast as fscanf.

So what I've learned for reading one 130MB file of ascii
double-precision (one number per row and 10M rows) using
Pentium 4 3GHz 1GB RAM Windows XP Home (with fresh boot --
unfortunately very important) is:

load ('filename.txt'); --> 16 minutes
fscanf using %g --> 8 minutes
textscan with %f64 --> 4 minutes

Since I only read and process the file once I don't have
the option of loading into matlab and saving as binary to
then process (as it consumes the time already, no need to
convert to binary any more). I do have many ascii files
from the scope to process (hundreds), so the most
efficient method looks to use the (speghetti-less) code
above, or similar in some other language such as C.

Thanks Rune for your raw numbers for comparison. The
conclusion seems pretty clear from your data.

Subject: how to save time loading data from file

From: Rune Allnor

Date: 6 Aug, 2008 07:25:27

Message: 20 of 20

On 6 Aug, 05:04, "ggk " <ggkm...@comcast.net> wrote:
> Thanks everyone for your replies and humor above. I've
> tried textscan and it's twice as fast as fscanf.
>
> So what I've learned for reading one 130MB file of ascii
> double-precision (one number per row and 10M rows) using
> Pentium 4 3GHz 1GB RAM Windows XP Home (with fresh boot --
> unfortunately very important) is:
>
> load ('filename.txt'); --> 16 minutes
> fscanf using %g --> 8 minutes
> textscan with %f64 --> 4 minutes
>
> Since I only read and process the file once I don't have
> the option of loading into matlab and saving as binary to
> then process (as it consumes the time already, no need to
> convert to binary any more). I do have many ascii files
> from the scope to process (hundreds), so the most
> efficient method looks to use the (speghetti-less) code
> above, or similar in some other language such as C.

Even if you only use each file once, you have so many files
to process that I would have the computer run overnight
to convert files to .mat, so they are ready for you to use
in the morning. That way the time-consuming tasks are done
in your free time, leaving yourself free to spend your
time at work to do the interesting stuff.

Rune

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us