Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
processing extremely long data file sequentially?

Subject: processing extremely long data file sequentially?

From: huhua

Date: 1 Mar, 2008 01:54:05

Message: 1 of 4

Hi all,

Let's say a CSV file has tens of millions lines and each line has many
columns.

I actually wanted to browse through it line by line (except the first line,
which is the headline),

and I need to cut most of the lines and columns out, and only use a few
lines and columns.

I am estimating that out of these tens of millions of lines, I only need to
retain tens of thousands of lines.

But I need to process them and cut the non-useful lines out.

Even Excel 2007 refused to load the file. Matlab crashed several times when
I tried to load.

What do I do?

Is there a "textread", "textscan", "csvread" file that can read it line by
line and sequentially?

I think it is important for the program to keep a relative pointer in the
CSV file so that after each line is read and processed, we can move to the
next line.

And I just need to sequentially write out another output file to take the
filtered lines.

Of course the benefit of "textread", "textscan", "csvread" is that they can
parse formated strings, including both text and numbers... that's
important...

Any ideas?

Thanks

Subject: processing extremely long data file sequentially?

From: Paul

Date: 1 Mar, 2008 03:30:20

Message: 2 of 4

"huhua" <lunamoonmoon@gmail.com> wrote in message
<fqacv8$et8$1@news.Stanford.EDU>...
> Hi all,
>
> Let's say a CSV file has tens of millions lines and each
line has many
> columns.
>
> I actually wanted to browse through it line by line
(except the first line,
> which is the headline),
>
> and I need to cut most of the lines and columns out, and
only use a few
> lines and columns.
>
> I am estimating that out of these tens of millions of
lines, I only need to
> retain tens of thousands of lines.
>
> But I need to process them and cut the non-useful lines out.
>
> Even Excel 2007 refused to load the file. Matlab crashed
several times when
> I tried to load.
>
> What do I do?
>
> Is there a "textread", "textscan", "csvread" file that can
read it line by
> line and sequentially?
>
> I think it is important for the program to keep a relative
pointer in the
> CSV file so that after each line is read and processed, we
can move to the
> next line.
>
> And I just need to sequentially write out another output
file to take the
> filtered lines.
>
> Of course the benefit of "textread", "textscan", "csvread"
is that they can
> parse formated strings, including both text and numbers...
that's
> important...
>
> Any ideas?
>
> Thanks
>
>
>
>
>
>
>

help fgetl

Subject: processing extremely long data file sequentially?

From: Andres Toennesmann

Date: 9 Mar, 2008 15:11:03

Message: 3 of 4

"huhua" <lunamoonmoon@gmail.com> wrote in message
<fqacv8$et8$1@news.Stanford.EDU>...
> Hi all,
>
> Let's say a CSV file has tens of millions lines and each
line has many
> columns.
>
> I actually wanted to browse through it line by line
(except the first line,
> which is the headline),
>
> and I need to cut most of the lines and columns out, and
only use a few
> lines and columns.

> []

If the csv contains mainly numeric data below the header
line, you may try txt2mat from the file exchange with its
'RowRange' and 'FilePos' arguments (see Help, esp. Example
5). This should be vastly quicker than fgetl.
Hth
Andres

Subject: processing extremely long data file sequentially?

From: NZTideMan

Date: 10 Mar, 2008 04:43:40

Message: 4 of 4

On Mar 10, 4:11=A0am, "Andres Toennesmann" <rant...@werb.de> wrote:
> "huhua" <lunamoonm...@gmail.com> wrote in message
>
> <fqacv8$et...@news.Stanford.EDU>...> Hi all,
>
> > Let's say a CSV file has tens of millions lines and each
> line has many
> > columns.
>
> > I actually wanted to browse through it line by line
>
> (except the first line,
>
> > which is the headline),
>
> > and I need to cut most of the lines and columns out, and
> only use a few
> > lines and columns.
> > []
>
> If the csv contains mainly numeric data below the header
> line, you may try txt2mat from the file exchange with its
> 'RowRange' and 'FilePos' arguments (see Help, esp. Example
> 5). This should be vastly quicker than fgetl.
> Hth
> Andres

I'd use Fortran, not Matlab for this job.
Fortran was developed back in the days of Hollerith cards, in which
you loaded one card of data at a time, so it can handle such a problem
easily and very fast.

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us