Path: news.mathworks.com!newsfeed-00.mathworks.com!news.kjsl.com!newsfeed.stanford.edu!shelby.stanford.edu!not-for-mail
From: "huhua" <lunamoonmoon@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: processing extremely long data file sequentially?
Date: Fri, 29 Feb 2008 17:54:05 -0800
Lines: 45
Message-ID: <fqacv8$et8$1@news.Stanford.EDU>
NNTP-Posting-Host: comtech-2007.stanford.edu
X-Trace: news.Stanford.EDU 1204336424 15272 171.64.113.16 (1 Mar 2008 01:53:44 GMT)
X-Complaints-To: news@news.stanford.edu
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.3138
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
Xref: news.mathworks.com comp.soft-sys.matlab:454719



Hi all,

Let's say a CSV file has tens of millions lines and each line has many 
columns.

I actually wanted to browse through it line by line (except the first line, 
which is the headline),

and I need to cut most of the lines and columns out, and only use a few 
lines and columns.

I am estimating that out of these tens of millions of lines, I only need to 
retain tens of thousands of lines.

But I need to process them and cut the non-useful lines out.

Even Excel 2007 refused to load the file. Matlab crashed several times when 
I tried to load.

What do I do?

Is there a "textread", "textscan", "csvread" file that can read it line by 
line and sequentially?

I think it is important for the program to keep a relative pointer in the 
CSV file so that after each line is read and processed, we can move to the 
next line.

And I just need to sequentially write out another output file to take the 
filtered lines.

Of course the benefit of "textread", "textscan", "csvread" is that they can 
parse formated strings, including both text and numbers... that's 
important...

Any ideas?

Thanks