Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!news2.google.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!newsfe09.iad.POSTED!7564ea0f!not-for-mail
From: Walter Roberson <roberson@hushmail.com>
Organization: Canada Eat The Cookie Foundation
User-Agent: Thunderbird 2.0.0.16 (Windows/20080708)
MIME-Version: 1.0
Newsgroups: comp.soft-sys.matlab
Subject: Re: Fastest way to get the number of lines
References: <g8v5cv$1us$1@fred.mathworks.com> <Z8Gsk.114613$nD.80388@pd7urf1no> <g9037b$efl$1@fred.mathworks.com> <g90jtn$p9g$1@z-news.pwr.wroc.pl> <gap9vn$jqf$1@fred.mathworks.com> <cfWzk.11670$rV4.5062@newsfe03.iad> <gar8vq$aiu$1@fred.mathworks.com>
In-Reply-To: <gar8vq$aiu$1@fred.mathworks.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 35
Message-ID: <plyAk.10596$Il.10480@newsfe09.iad>
NNTP-Posting-Host: 24.79.146.116
X-Complaints-To: internet.abuse@sjrb.ca
X-Trace: newsfe09.iad 1221767637 24.79.146.116 (Thu, 18 Sep 2008 19:53:57 UTC)
NNTP-Posting-Date: Thu, 18 Sep 2008 19:53:57 UTC
Date: Thu, 18 Sep 2008 14:54:08 -0500
Xref: news.mathworks.com comp.soft-sys.matlab:490823


Pete sherer wrote:

> I am sorry for not putting the request all at once.  Would it be possible
> to match 2 or more numbers?  Like I would like to find the rows with the
> first column matching 2 and 678 numbers.  

> I can simply call the program twice, but for a huge file size, I think it's
> probably faster to do it inside the perl.

In findvallines.pl put these two lines:

$fn = pop @ARGV; $" = '|'; $targetpat = qr/^(?:@ARGV),/o; @ARGV = ($fn);
while (<>) { /$targetpat/ && do { print $.,"\n" } }


Example invocation:

>> linenum = str2num( perl('findvalline.pl', '23', '678', 'XYZ.csv') )

linenum =

     2
     4


If you wanted niceties such as printing out which of the lines matched what, you
should have specified.

Note that this code could be improved, because it no longer stops when it finds
a match. It doesn't even stop when it has found as many matches as there were
original numbers. (You didn't promise that all of the lines started with unique
values.) If the first column is unique, then stopping upon the last match would
be reasonably fast; if the first column is not unique but you only want to
report the first matching line for each pattern, then the code would have
to be more complicated and would slow down.