Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
"streaming" data

Subject: "streaming" data

From: Dominic

Date: 2 Jun, 2009 13:35:03

Message: 1 of 14

Hi,
I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
Thank you for your help!

Subject: "streaming" data

From: dpb

Date: 2 Jun, 2009 13:53:26

Message: 2 of 14

Dominic wrote:
> Hi, I am working with some very large data files and was wondering if
> there was any way I could, instead of loading them all at once,
> stream them into MATLAB in groups of say 720,000 rows of the file.
> The files are all different lengths, but most are upwards of
> 5,000,000 points long. Thank you for your help!

Look at the optional SIZE argument in the low-level i/o functions such
as fread or fscanf depending on data format.

--

Subject: "streaming" data

From: us

Date: 2 Jun, 2009 13:57:02

Message: 3 of 14

"Dominic " <dcg48@cornell.edu> wrote in message <h039q7$6g2$1@fred.mathworks.com>...
> Hi,
> I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
> Thank you for your help!

a hint:

     help memmapfile;

us

Subject: "streaming" data

From: Dominic

Date: 2 Jun, 2009 15:05:04

Message: 4 of 14

"us " <us@neurol.unizh.ch> wrote in message <h03b3e$slo$1@fred.mathworks.com>...
> "Dominic " <dcg48@cornell.edu> wrote in message <h039q7$6g2$1@fred.mathworks.com>...
> > Hi,
> > I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
> > Thank you for your help!
>
> a hint:
>
> help memmapfile;
>
> us

How do I use memmapfile to load only part of the file?

Subject: "streaming" data

From: us

Date: 2 Jun, 2009 15:13:01

Message: 5 of 14

"Dominic " <dcg48@cornell.edu> wrote in message <h03f30$k9s$1@fred.mathworks.com>...
> "us " <us@neurol.unizh.ch> wrote in message <h03b3e$slo$1@fred.mathworks.com>...
> > "Dominic " <dcg48@cornell.edu> wrote in message <h039q7$6g2$1@fred.mathworks.com>...
> > > Hi,
> > > I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
> > > Thank you for your help!
> >
> > a hint:
> >
> > help memmapfile;
> >
> > us
>
> How do I use memmapfile to load only part of the file?

well, ...

     help memmapfile;
     doc memmapfile;

shall give you plenty of answers...

us

Subject: "streaming" data

From: Dominic

Date: 2 Jun, 2009 16:36:01

Message: 6 of 14

"us " <us@neurol.unizh.ch> wrote in message <h03fht$l86$1@fred.mathworks.com>...
> "Dominic " <dcg48@cornell.edu> wrote in message <h03f30$k9s$1@fred.mathworks.com>...
> > "us " <us@neurol.unizh.ch> wrote in message <h03b3e$slo$1@fred.mathworks.com>...
> > > "Dominic " <dcg48@cornell.edu> wrote in message <h039q7$6g2$1@fred.mathworks.com>...
> > > > Hi,
> > > > I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
> > > > Thank you for your help!
> > >
> > > a hint:
> > >
> > > help memmapfile;
> > >
> > > us
> >
> > How do I use memmapfile to load only part of the file?
>
> well, ...
>
> help memmapfile;
> doc memmapfile;
>
> shall give you plenty of answers...
>
> us

It gives me answers on what it does, but I am still unclear as to how to use it for my task.

Subject: "streaming" data

From: us

Date: 2 Jun, 2009 16:46:01

Message: 7 of 14

"Dominic " <dcg48@cornell.edu> wrote in message <h03kdh$glv$1@fred.mathworks.com>...
> "us " <us@neurol.unizh.ch> wrote in message <h03fht$l86$1@fred.mathworks.com>...
> > "Dominic " <dcg48@cornell.edu> wrote in message <h03f30$k9s$1@fred.mathworks.com>...
> > > "us " <us@neurol.unizh.ch> wrote in message <h03b3e$slo$1@fred.mathworks.com>...
> > > > "Dominic " <dcg48@cornell.edu> wrote in message <h039q7$6g2$1@fred.mathworks.com>...
> > > > > Hi,
> > > > > I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
> > > > > Thank you for your help!
> > > >
> > > > a hint:
> > > >
> > > > help memmapfile;
> > > >
> > > > us
> > >
> > > How do I use memmapfile to load only part of the file?
> >
> > well, ...
> >
> > help memmapfile;
> > doc memmapfile;
> >
> > shall give you plenty of answers...
> >
> > us
>
> It gives me answers on what it does, but I am still unclear as to how to use it for my task.

well, ...

1) you might also want to look at the exhaustive example section in the doc (it happens to be at the bottom)...
2) you might want to be a bit more specific about your needs, which is typically achieved by showing a SMALL (yet complete) example of your data...

us

Subject: "streaming" data

From: Dominic

Date: 2 Jun, 2009 19:14:01

Message: 8 of 14

"us " <us@neurol.unizh.ch> wrote in message <h03l09$prt$1@fred.mathworks.com>...
> "Dominic " <dcg48@cornell.edu> wrote in message <h03kdh$glv$1@fred.mathworks.com>...
> > "us " <us@neurol.unizh.ch> wrote in message <h03fht$l86$1@fred.mathworks.com>...
> > > "Dominic " <dcg48@cornell.edu> wrote in message <h03f30$k9s$1@fred.mathworks.com>...
> > > > "us " <us@neurol.unizh.ch> wrote in message <h03b3e$slo$1@fred.mathworks.com>...
> > > > > "Dominic " <dcg48@cornell.edu> wrote in message <h039q7$6g2$1@fred.mathworks.com>...
> > > > > > Hi,
> > > > > > I am working with some very large data files and was wondering if there was any way I could, instead of loading them all at once, stream them into MATLAB in groups of say 720,000 rows of the file. The files are all different lengths, but most are upwards of 5,000,000 points long.
> > > > > > Thank you for your help!
> > > > >
> > > > > a hint:
> > > > >
> > > > > help memmapfile;
> > > > >
> > > > > us
> > > >
> > > > How do I use memmapfile to load only part of the file?
> > >
> > > well, ...
> > >
> > > help memmapfile;
> > > doc memmapfile;
> > >
> > > shall give you plenty of answers...
> > >
> > > us
> >
> > It gives me answers on what it does, but I am still unclear as to how to use it for my task.
>
> well, ...
>
> 1) you might also want to look at the exhaustive example section in the doc (it happens to be at the bottom)...
> 2) you might want to be a bit more specific about your needs, which is typically achieved by showing a SMALL (yet complete) example of your data...
>
> us

A small sample of my data would be:
-19,-16
1,-27
-7,-39
-14,-27
-12,-38
-7,-26
2,-22
13,-45
16,-42
26,-29
48,-21
38,20
47,56
77,83
99,122
90,117

This is an excerpt from a file of 10,000 such rows. I want to load all these rows into a matrix and run them through a function which isolates certain data points and creates a new matrix. I am only interested in the final matrix, so the intermediate one can be overwritten each time I load a part of my data file. I tried using memmapfile but the data did not load correctly, all the numbers were decreased by a significant factor but the matrix size was correct for what I wanted.
Thank you

Subject: "streaming" data

From: tristram.scott@ntlworld.com (Tristram Scott)

Date: 3 Jun, 2009 08:45:29

Message: 9 of 14

Dominic <dcg48@cornell.edu> wrote:
[snip]
>
> A small sample of my data would be:
> -19,-16
> 1,-27
> -7,-39
> -14,-27
> -12,-38
> -7,-26
> 2,-22
> 13,-45
> 16,-42
> 26,-29
> 48,-21
> 38,20
> 47,56
> 77,83
> 99,122
> 90,117
>
> This is an excerpt from a file of 10,000 such rows. I want to load all
these rows into a matrix and run them through a function which isolates
certain data points and creates a new matrix. I am only interested in the
final matrix, so the intermediate one can be overwritten each time I load a
part of my data file. I tried using memmapfile but the data did not load
correctly, all the numbers were decreased by a significant factor but the
matrix size was correct for what I wanted.

Perhaps you could chop them up with a text editor before bringing them to
MATLAB. No skill required.

Alternatively, if you can't work it with memmapfile, just read the data one
line at a time (or 720,000 rows at a time) using fopen and fgetl.

Fastest would probably be to preprocess the data into the required number
of rows per file, and at the same time remove the commas. Then you could
read them using the load command, which is almost always the fastest
method.



--
Dr Tristram J. Scott
Energy Consultant

Subject: "streaming" data

From: dpb

Date: 3 Jun, 2009 12:40:28

Message: 10 of 14

Tristram Scott wrote:
> Dominic <dcg48@cornell.edu> wrote:
> [snip]
>> A small sample of my data would be:
>> -19,-16
>> 1,-27
...snip...
>>
>> This is an excerpt from a file of 10,000 such rows. I want to load all
> these rows into a matrix ...
>
> Perhaps you could chop them up with a text editor before bringing them to
> MATLAB. No skill required.
>
> Alternatively, if you can't work it with memmapfile, just read the data one
> line at a time (or 720,000 rows at a time) using fopen and fgetl.
>
> Fastest would probably be to preprocess the data into the required number
> of rows per file, and at the same time remove the commas. Then you could
> read them using the load command, which is almost always the fastest
> method.

Alternatively, 2x10000 isn't all that large (~160 kb double array) so
textread() or one of its companions and processing the file in one pass
is probably simplest overall. You can even CLEAR the initial array when
done w/ it if desired but unless using some other really large memory
data structures probably unnecessary.

--


--

Subject: "streaming" data

From: Dominic

Date: 3 Jun, 2009 13:10:17

Message: 11 of 14

dpb <none@non.net> wrote in message <h05r5k$g90$1@aioe.org>...
> Tristram Scott wrote:
> > Dominic <dcg48@cornell.edu> wrote:
> > [snip]
> >> A small sample of my data would be:
> >> -19,-16
> >> 1,-27
> ...snip...
> >>
> >> This is an excerpt from a file of 10,000 such rows. I want to load all
> > these rows into a matrix ...
> >
> > Perhaps you could chop them up with a text editor before bringing them to
> > MATLAB. No skill required.
> >
> > Alternatively, if you can't work it with memmapfile, just read the data one
> > line at a time (or 720,000 rows at a time) using fopen and fgetl.
> >
> > Fastest would probably be to preprocess the data into the required number
> > of rows per file, and at the same time remove the commas. Then you could
> > read them using the load command, which is almost always the fastest
> > method.
>
> Alternatively, 2x10000 isn't all that large (~160 kb double array) so
> textread() or one of its companions and processing the file in one pass
> is probably simplest overall. You can even CLEAR the initial array when
> done w/ it if desired but unless using some other really large memory
> data structures probably unnecessary.
>
> --
>
>
> --
So there is no way for MATLAB to pre-process the files? I have thousands of them so that would take too long. The files are not 10,000 points long, but 10,000,000. Sorry that was a typo on my part.
Thank you

Subject: "streaming" data

From: dpb

Date: 3 Jun, 2009 13:23:51

Message: 12 of 14

Dominic wrote:
...
> So there is no way for MATLAB to pre-process the files? I have
> thousands of them so that would take too long. The files are not
> 10,000 points long, but 10,000,000. Sorry that was a typo on my
> part. Thank you

What do you mean by "pre-process"?

OK, so 2x10M is too large to do in memory.

I don't have late-enough version of ML to help w/ the memory-mapped file
route, but as noted before you could use the optional SIZE argument in
fscanf() to read sections of the file in sequence.

--

Subject: "streaming" data

From: tristram.scott@ntlworld.com (Tristram Scott)

Date: 3 Jun, 2009 13:54:43

Message: 13 of 14

Dominic <dcg48@cornell.edu> wrote:
> dpb <none@non.net> wrote in message <h05r5k$g90$1@aioe.org>...
[snip]

>>
>> Alternatively, 2x10000 isn't all that large (~160 kb double array) so
>> textread() or one of its companions and processing the file in one pass
>> is probably simplest overall. You can even CLEAR the initial array when
>> done w/ it if desired but unless using some other really large memory
>> data structures probably unnecessary.
>>
>> --
>>
>>
>> --
> So there is no way for MATLAB to pre-process the files? I have thousands
of them so that would take too long. The files are not 10,000 points long,
but 10,000,000. Sorry that was a typo on my part.

Yes, you can use MATLAB to preprocess the files, but that might not be the
best thing to do.

What platform are you working on? Under Unix you could tackle this all
with awk or perl, or even sed and head / tail.

But, 10,000,000 lines only leads to a variable of size ~160 MB. That
should fit in memory, especially if you are not already doing lots of other
things. . Load the data using textscan or whatever else you
find convenient, chop it into appropriate sized chunks, then save it all to
disk as a .mat file. Once you have done that for all of your data files,
come back and process the data. You can load from your .mat file one bit
at a time.

--
Dr Tristram J. Scott
Energy Consultant

Subject: "streaming" data

From: dpb

Date: 3 Jun, 2009 14:54:44

Message: 14 of 14

dpb wrote:
> Dominic wrote:
> ...
>> So there is no way for MATLAB to pre-process the files? I have
>> thousands of them so that would take too long. The files are not
>> 10,000 points long, but 10,000,000. Sorry that was a typo on my
>> part. Thank you
>
> What do you mean by "pre-process"?
>
> OK, so 2x10M is too large to do in memory.
...

Actually, on reflection, w/ many (most/all?) of today's machines, 160MB
isn't all that much, either.

I've a moderately old 1Gig machine and

x=rand(10E6,2);

took only barely over a second to complete.

Looks like simply reading and processing would be the thing until you
actually do run into memory problems.

--

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us