<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126</link>
    <title>MATLAB Central Newsreader - Fastest way to get the number of lines</title>
    <description>Feed for thread: Fastest way to get the number of lines</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Tue, 16 Sep 2008 21:55:03 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#600630</link>
      <author>Pete sherer</author>
      <description>Similarly instead of counting the number of lines in the file, can the PERL code be modified to find the first row (of the first column only) that contains a specified value?&lt;br&gt;
&lt;br&gt;
For example, if the file looks like&lt;br&gt;
2,45,56,7767,76,565.5,...&lt;br&gt;
23,454,556,74767,476,5465.5,...&lt;br&gt;
56,15,16,1767,176,1565.5,...&lt;br&gt;
678,45,5,67,6,0.5,...&lt;br&gt;
845,11,22,45,32,2.5,...&lt;br&gt;
...&lt;br&gt;
&lt;br&gt;
For example, I want to know the line number that the first column contains 678, then the line number should be 4.&lt;br&gt;
&lt;br&gt;
Thanks so much in advance.&lt;br&gt;
Pete</description>
    </item>
    <item>
      <pubDate>Tue, 16 Sep 2008 22:17:05 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#600632</link>
      <author>Walter Roberson</author>
      <description>Pete sherer wrote:&lt;br&gt;
&amp;gt; Similarly instead of counting the number of lines in the file, can the PERL &lt;br&gt;
&amp;gt; code be modified to find the first row (of the first column only) that &lt;br&gt;
&amp;gt; contains a specified value?&lt;br&gt;
&lt;br&gt;
&amp;gt; For example, if the file looks like&lt;br&gt;
&amp;gt; 2,45,56,7767,76,565.5,...&lt;br&gt;
&amp;gt; 23,454,556,74767,476,5465.5,...&lt;br&gt;
&amp;gt; 56,15,16,1767,176,1565.5,...&lt;br&gt;
&amp;gt; 678,45,5,67,6,0.5,...&lt;br&gt;
&amp;gt; 845,11,22,45,32,2.5,...&lt;br&gt;
&amp;gt; ...&lt;br&gt;
&lt;br&gt;
&amp;gt; For example, I want to know the line number that the first column contains 678, &lt;br&gt;
&amp;gt; then the line number should be 4.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Yes. For example, store the below two lines in findvalline.pl&lt;br&gt;
&lt;br&gt;
$targetval = shift @ARGV;&lt;br&gt;
while (&amp;lt;&amp;gt;) { /^$targetval,/ &amp;&amp; do { print $.,&quot;\n&quot;; break } }&lt;br&gt;
&lt;br&gt;
Then to make a matlab call to find the value N in file XYZ.csv&lt;br&gt;
&lt;br&gt;
linenum = str2num( perl('findvalline.pl', num2str(N), 'XYZ.csv') );&lt;br&gt;
if isemtpy(linenum); error('no match'); end&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
But be careful if your target is not an integer: num2str() will&lt;br&gt;
not necessarily round or truncate the same way as is in the file.&lt;br&gt;
You may wish to use a more sophisticated way of determining&lt;br&gt;
the matching string than using num2str().&lt;br&gt;
&lt;br&gt;
-- &lt;br&gt;
Q = quotation(rand);&lt;br&gt;
if isempty(Q); error('Quotation server filesystem problems')&lt;br&gt;
else sprintf('%s',Q), end</description>
    </item>
    <item>
      <pubDate>Wed, 17 Sep 2008 11:28:02 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#600678</link>
      <author>E </author>
      <description>Here's my take :&lt;br&gt;
&lt;br&gt;
fh = fopen(filename, 'r');&lt;br&gt;
chunksize = 1e6; % read chuncks of 1MB at a time&lt;br&gt;
n2 = 0;&lt;br&gt;
while ~feof(fh)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ch = fread(fh, chunksize, '*uchar');&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if isempty(ch)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;break&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;end&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;numlines = numlines + sum(ch == sprintf('\n'));&lt;br&gt;
end&lt;br&gt;
fclose(fh);</description>
    </item>
    <item>
      <pubDate>Wed, 17 Sep 2008 15:50:18 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#600720</link>
      <author>Pete sherer</author>
      <description>&lt;br&gt;
Thank you so much Walter for your help to answer my request.&lt;br&gt;
&lt;br&gt;
I am sorry for not putting the request all at once.  Would it be possible to match 2 or more numbers?  Like I would like to find the rows with the first column matching 2 and 678 numbers.  &lt;br&gt;
&lt;br&gt;
I can simply call the program twice, but for a huge file size, I think it's probably faster to do it inside the perl.&lt;br&gt;
&lt;br&gt;
Thanks a lot in advance.&lt;br&gt;
Pete</description>
    </item>
    <item>
      <pubDate>Thu, 18 Sep 2008 19:54:08 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#600912</link>
      <author>Walter Roberson</author>
      <description>Pete sherer wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt; I am sorry for not putting the request all at once.  Would it be possible&lt;br&gt;
&amp;gt; to match 2 or more numbers?  Like I would like to find the rows with the&lt;br&gt;
&amp;gt; first column matching 2 and 678 numbers.  &lt;br&gt;
&lt;br&gt;
&amp;gt; I can simply call the program twice, but for a huge file size, I think it's&lt;br&gt;
&amp;gt; probably faster to do it inside the perl.&lt;br&gt;
&lt;br&gt;
In findvallines.pl put these two lines:&lt;br&gt;
&lt;br&gt;
$fn = pop @ARGV; $&quot; = '|'; $targetpat = qr/^(?:@ARGV),/o; @ARGV = ($fn);&lt;br&gt;
while (&amp;lt;&amp;gt;) { /$targetpat/ &amp;&amp; do { print $.,&quot;\n&quot; } }&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Example invocation:&lt;br&gt;
&lt;br&gt;
&amp;gt;&amp;gt; linenum = str2num( perl('findvalline.pl', '23', '678', 'XYZ.csv') )&lt;br&gt;
&lt;br&gt;
linenum =&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;4&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
If you wanted niceties such as printing out which of the lines matched what, you&lt;br&gt;
should have specified.&lt;br&gt;
&lt;br&gt;
Note that this code could be improved, because it no longer stops when it finds&lt;br&gt;
a match. It doesn't even stop when it has found as many matches as there were&lt;br&gt;
original numbers. (You didn't promise that all of the lines started with unique&lt;br&gt;
values.) If the first column is unique, then stopping upon the last match would&lt;br&gt;
be reasonably fast; if the first column is not unique but you only want to&lt;br&gt;
report the first matching line for each pattern, then the code would have&lt;br&gt;
to be more complicated and would slow down.</description>
    </item>
    <item>
      <pubDate>Mon, 25 Aug 2008 20:41:03 -0400</pubDate>
      <title>Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#597285</link>
      <author>Pete sherer</author>
      <description>I have a gigantic .csv file (about 7-9GB), which contains&lt;br&gt;
about 6.5 million lines of numbers.  Each row contains about&lt;br&gt;
15,000 data in comma delimiter format.&lt;br&gt;
&lt;br&gt;
Currently I am using TEXTSCAN to extract only the first&lt;br&gt;
column to determine the number of lines in the file.  It&lt;br&gt;
took 4-5 hours on 3GHz pentium IV.  Are there any better&lt;br&gt;
solution to just get the number of lines?  Thanks a lot.&lt;br&gt;
&lt;br&gt;
I have already skip the other columns when reading. &lt;br&gt;
col = textscan( fid, ['%f' repmat('%*f',1,14999)], -1,&lt;br&gt;
'delimiter', ',');&lt;br&gt;
numLines = length(col);&lt;br&gt;
&lt;br&gt;
Thanks a lot in advance.</description>
    </item>
    <item>
      <pubDate>Mon, 25 Aug 2008 20:49:02 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#597287</link>
      <author>Pete sherer</author>
      <description>Sorry - the file size is actually 37.0 GB!!&lt;br&gt;
&lt;br&gt;
Thanks a lot in advance</description>
    </item>
    <item>
      <pubDate>Mon, 25 Aug 2008 22:14:17 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#597302</link>
      <author>Walter Roberson</author>
      <description>Pete sherer wrote:&lt;br&gt;
&amp;gt; I have a gigantic .csv file (about 7-9GB), which contains&lt;br&gt;
&amp;gt; about 6.5 million lines of numbers.  Each row contains about&lt;br&gt;
&amp;gt; 15,000 data in comma delimiter format.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Currently I am using TEXTSCAN to extract only the first&lt;br&gt;
&amp;gt; column to determine the number of lines in the file.  It&lt;br&gt;
&amp;gt; took 4-5 hours on 3GHz pentium IV.  Are there any better&lt;br&gt;
&amp;gt; solution to just get the number of lines?&lt;br&gt;
&lt;br&gt;
If you are using Windows OS, use Matlab's perl interface: the perl&lt;br&gt;
code is trivial and should be fairly fast:&lt;br&gt;
&lt;br&gt;
For example, store the below two lines in countlines.pl&lt;br&gt;
&lt;br&gt;
while (&amp;lt;&amp;gt;) {};&lt;br&gt;
print $.,&quot;\n&quot;;&lt;br&gt;
&lt;br&gt;
Then to make a matlab call to count the lines for file XYZ.csv&lt;br&gt;
&lt;br&gt;
numlines = str2num( perl('countlines.pl', 'XYZ.csv') );&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
If you are using any other OS, then you can skip the perl and use&lt;br&gt;
&lt;br&gt;
[status, result] = system( ['wc -l ', 'XYZ.csv'] );&lt;br&gt;
numlines = str2num( result );&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Note: I don't promise that wc -l will work properly if there are more&lt;br&gt;
than 2^31-1 lines in the file... not something I ever looked into.&lt;br&gt;
The perl version should be able to handle up to 2^52-1 lines per file;&lt;br&gt;
if you have more than that, then it becomes more difficult to get&lt;br&gt;
an accurate line count through (but it should be possible with&lt;br&gt;
some trickery.)&lt;br&gt;
&lt;br&gt;
-- &lt;br&gt;
Q = quotation(rand);&lt;br&gt;
if isempty(Q); error('Quotation server filesystem problems')&lt;br&gt;
else sprintf('%s',Q), end</description>
    </item>
    <item>
      <pubDate>Tue, 26 Aug 2008 05:10:03 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#597341</link>
      <author>Pete sherer</author>
      <description>Thanks so much for the suggestion.&lt;br&gt;
&lt;br&gt;
The perl code took about 3 hrs, while textscan TOOK 25 times&lt;br&gt;
longer - 3+ DAYS!!!&lt;br&gt;
&lt;br&gt;
Thanks so much.</description>
    </item>
    <item>
      <pubDate>Tue, 26 Aug 2008 08:21:01 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#597353</link>
      <author>us</author>
      <description>&quot;Pete sherer&quot;:&lt;br&gt;
&amp;lt;SNIP strange...&lt;br&gt;
&lt;br&gt;
&amp;gt; Sorry - the file size is actually 37.0 GB...&lt;br&gt;
&lt;br&gt;
how can that be?&lt;br&gt;
&lt;br&gt;
% using your numbers from the first post&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;37*2^30/6500000&lt;br&gt;
% ans = 6112.1 % &amp;lt;- bytes/line&lt;br&gt;
% but you also tell CSSM that&lt;br&gt;
&lt;br&gt;
Each row contains about&lt;br&gt;
15,000 data in comma delimiter format.&lt;br&gt;
&lt;br&gt;
???&lt;br&gt;
your file should be MUCH bigger than 37g - even if a line &lt;br&gt;
looked like this&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;n,n,n,n,...&lt;br&gt;
&lt;br&gt;
can you be more specific on what you mean by a 15k comma &lt;br&gt;
delimited data line looks like?&lt;br&gt;
&lt;br&gt;
us</description>
    </item>
    <item>
      <pubDate>Tue, 26 Aug 2008 09:55:03 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#597371</link>
      <author>pisz_na.mirek@dionizos.zind.ikem.pwr.wroc.pl</author>
      <description>Pete sherer &amp;lt;tsh@abg.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; Thanks so much for the suggestion.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; The perl code took about 3 hrs, while textscan TOOK 25 times&lt;br&gt;
&amp;gt; longer - 3+ DAYS!!!&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Thanks so much.&lt;br&gt;
&lt;br&gt;
wc command in linux on my old Athlon 1600XP scans such file &lt;br&gt;
in 20 seconds/GB with about 20% CPU usage.</description>
    </item>
    <item>
      <pubDate>Wed, 13 Apr 2011 10:45:07 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#830857</link>
      <author>James McCloskey</author>
      <description>Hey Walter,&lt;br&gt;
&lt;br&gt;
The countlines script here seems to work great for me but I'm trying to use the findvalue script and keep gettin NaN as a result using your code (I think perl is returning an empty field).&lt;br&gt;
&lt;br&gt;
I'm looking for a line in my datafile where the first (and only column in this particular line) contains the string &quot;CommentsData&quot;  should this work for your script?&lt;br&gt;
&lt;br&gt;
I'm trying to export data from a text file up-uptill the comments are appended and this would be a fast way to find the line number and initialize my variables (and dimensions etc) before beginning exporting data.&lt;br&gt;
&lt;br&gt;
Thanks&lt;br&gt;
&lt;br&gt;
Jim &lt;br&gt;
&lt;br&gt;
Walter Roberson &amp;lt;roberson@hushmail.com&amp;gt; wrote in message &amp;lt;plyAk.10596$Il.10480@newsfe09.iad&amp;gt;...&lt;br&gt;
&amp;gt; Pete sherer wrote:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; I am sorry for not putting the request all at once.  Would it be possible&lt;br&gt;
&amp;gt; &amp;gt; to match 2 or more numbers?  Like I would like to find the rows with the&lt;br&gt;
&amp;gt; &amp;gt; first column matching 2 and 678 numbers.  &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; I can simply call the program twice, but for a huge file size, I think it's&lt;br&gt;
&amp;gt; &amp;gt; probably faster to do it inside the perl.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; In findvallines.pl put these two lines:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; $fn = pop @ARGV; $&quot; = '|'; $targetpat = qr/^(?:@ARGV),/o; @ARGV = ($fn);&lt;br&gt;
&amp;gt; while (&amp;lt;&amp;gt;) { /$targetpat/ &amp;&amp; do { print $.,&quot;\n&quot; } }&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Example invocation:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;&amp;gt; linenum = str2num( perl('findvalline.pl', '23', '678', 'XYZ.csv') )&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; linenum =&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt;      2&lt;br&gt;
&amp;gt;      4&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If you wanted niceties such as printing out which of the lines matched what, you&lt;br&gt;
&amp;gt; should have specified.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Note that this code could be improved, because it no longer stops when it finds&lt;br&gt;
&amp;gt; a match. It doesn't even stop when it has found as many matches as there were&lt;br&gt;
&amp;gt; original numbers. (You didn't promise that all of the lines started with unique&lt;br&gt;
&amp;gt; values.) If the first column is unique, then stopping upon the last match would&lt;br&gt;
&amp;gt; be reasonably fast; if the first column is not unique but you only want to&lt;br&gt;
&amp;gt; report the first matching line for each pattern, then the code would have&lt;br&gt;
&amp;gt; to be more complicated and would slow down.</description>
    </item>
    <item>
      <pubDate>Wed, 13 Apr 2011 14:45:10 -0400</pubDate>
      <title>Re: Fastest way to get the number of lines</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/235126#830919</link>
      <author>James McCloskey</author>
      <description>Just figured it out... the regular experssion /^$targetval,/ was appending a comma to the end of the string...&lt;br&gt;
&lt;br&gt;
Works now&lt;br&gt;
&lt;br&gt;
Thanks!</description>
    </item>
  </channel>
</rss>

