Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Fastest way to get the number of lines
Date: Mon, 25 Aug 2008 20:41:03 +0000 (UTC)
Organization: AIR Worldwide Corp
Lines: 17
Message-ID: <g8v5cv$1us$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1219696863 2012 172.30.248.37 (25 Aug 2008 20:41:03 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 25 Aug 2008 20:41:03 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1076536
Xref: news.mathworks.com comp.soft-sys.matlab:487185



I have a gigantic .csv file (about 7-9GB), which contains
about 6.5 million lines of numbers.  Each row contains about
15,000 data in comma delimiter format.

Currently I am using TEXTSCAN to extract only the first
column to determine the number of lines in the file.  It
took 4-5 hours on 3GHz pentium IV.  Are there any better
solution to just get the number of lines?  Thanks a lot.

I have already skip the other columns when reading. 
col = textscan( fid, ['%f' repmat('%*f',1,14999)], -1,
'delimiter', ',');
numLines = length(col);

Thanks a lot in advance.