<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/265308</link>
    <title>MATLAB Central Newsreader - really big data files</title>
    <description>Feed for thread: really big data files</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Sun, 08 Nov 2009 19:24:02 -0500</pubDate>
      <title>really big data files</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/265308#693060</link>
      <author>Jon Shultz</author>
      <description>I'm trying to read in a datafile that's really big (&amp;gt;2GB) in sections that are a couple hundred thousand lines long each.  I need to know how many lines are in the parent file first.  &lt;br&gt;
&lt;br&gt;
I have a routine now that does it like this:&lt;br&gt;
totlines=0;&lt;br&gt;
while ~feof(fid)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;line=fgetl(fid);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;totlines=totlines+1;&lt;br&gt;
end&lt;br&gt;
&lt;br&gt;
This does well with the memory part, but takes forever.  There has got to be a more efficient way to do this, but I'm stuck.&lt;br&gt;
&lt;br&gt;
Thanks!</description>
    </item>
    <item>
      <pubDate>Sun, 08 Nov 2009 19:31:35 -0500</pubDate>
      <title>Re: really big data files</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/265308#693061</link>
      <author>Rune Allnor</author>
      <description>On 8 Nov, 20:24, &quot;Jon Shultz&quot; &amp;lt;jjddshu...@yahoo.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; I'm trying to read in a datafile that's really big (&amp;gt;2GB) in sections that are a couple hundred thousand lines long each. &#160;I need to know how many lines are in the parent file first. &#160;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I have a routine now that does it like this:&lt;br&gt;
&amp;gt; totlines=0;&lt;br&gt;
&amp;gt; while ~feof(fid)&lt;br&gt;
&amp;gt; &#160; &#160; line=fgetl(fid);&lt;br&gt;
&amp;gt; &#160; &#160; totlines=totlines+1;&lt;br&gt;
&amp;gt; end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; This does well with the memory part, but takes forever. &#160;There has got to be a more efficient way to do this, but I'm stuck.&lt;br&gt;
&lt;br&gt;
Read the file in larger batches than a single line.&lt;br&gt;
&lt;br&gt;
Rune</description>
    </item>
    <item>
      <pubDate>Mon, 09 Nov 2009 00:00:19 -0500</pubDate>
      <title>Re: really big data files</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/265308#693107</link>
      <author>Jon Shultz</author>
      <description>Rune Allnor &amp;lt;allnor@tele.ntnu.no&amp;gt; wrote in message &amp;lt;dfca570a-9e21-4622-bdea-69768c9d26b4@p8g2000yqb.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; On 8 Nov, 20:24, &quot;Jon Shultz&quot; &amp;lt;jjddshu...@yahoo.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; I'm trying to read in a datafile that's really big (&amp;gt;2GB) in sections that are a couple hundred thousand lines long each. ?I need to know how many lines are in the parent file first. ?&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I have a routine now that does it like this:&lt;br&gt;
&amp;gt; &amp;gt; totlines=0;&lt;br&gt;
&amp;gt; &amp;gt; while ~feof(fid)&lt;br&gt;
&amp;gt; &amp;gt; ? ? line=fgetl(fid);&lt;br&gt;
&amp;gt; &amp;gt; ? ? totlines=totlines+1;&lt;br&gt;
&amp;gt; &amp;gt; end&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; This does well with the memory part, but takes forever. ?There has got to be a more efficient way to do this, but I'm stuck.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Read the file in larger batches than a single line.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Rune&lt;br&gt;
&lt;br&gt;
Thank you.  I am using textscan to get the data blocks in the code which follows what I have written above.  Let me restate my question.  Is there a way to determine the number of lines in a large file without reading in the data (which will crash Matlab)?&lt;br&gt;
&lt;br&gt;
I want to use the total number of lines to determine the best way to segment the files.  &lt;br&gt;
&lt;br&gt;
Jon</description>
    </item>
    <item>
      <pubDate>Mon, 09 Nov 2009 01:31:37 -0500</pubDate>
      <title>Re: really big data files</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/265308#693113</link>
      <author>TideMan</author>
      <description>On Nov 9, 1:00&#160;pm, &quot;Jon Shultz&quot; &amp;lt;jjddshu...@yahoo.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; Rune Allnor &amp;lt;all...@tele.ntnu.no&amp;gt; wrote in message &amp;lt;dfca570a-9e21-4622-bdea-69768c9d2...@p8g2000yqb.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; On 8 Nov, 20:24, &quot;Jon Shultz&quot; &amp;lt;jjddshu...@yahoo.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I'm trying to read in a datafile that's really big (&amp;gt;2GB) in sections that are a couple hundred thousand lines long each. ?I need to know how many lines are in the parent file first. ?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I have a routine now that does it like this:&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; totlines=0;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; while ~feof(fid)&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; ? ? line=fgetl(fid);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; ? ? totlines=totlines+1;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; This does well with the memory part, but takes forever. ?There has got to be a more efficient way to do this, but I'm stuck.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Read the file in larger batches than a single line.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Rune&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Thank you. &#160;I am using textscan to get the data blocks in the code which follows what I have written above. &#160;Let me restate my question. &#160;Is there a way to determine the number of lines in a large file without reading in the data (which will crash Matlab)?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I want to use the total number of lines to determine the best way to segment the files. &#160;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Jon&lt;br&gt;
&lt;br&gt;
Copy and paste these lines into a new file called CountLines.pl in&lt;br&gt;
Matlab's path:&lt;br&gt;
while (&amp;lt;&amp;gt;) {};&lt;br&gt;
print $.,&quot;\n&quot;;&lt;br&gt;
&lt;br&gt;
Now, run it in Matlab like this:&lt;br&gt;
perl('CountLines.pl',filename)&lt;br&gt;
where filename is your file name.</description>
    </item>
    <item>
      <pubDate>Tue, 17 Nov 2009 15:21:04 -0500</pubDate>
      <title>Re: really big data files</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/265308#695448</link>
      <author>Jon Shultz</author>
      <description>Excerpt from above:&lt;br&gt;
&lt;br&gt;
&amp;gt; &amp;gt;?Let me restate my question. ?Is there a way to determine the number of lines in a large file without reading in the data (which will crash Matlab)?&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I want to use the total number of lines to determine the best way to segment the files. ?&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Jon&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Copy and paste these lines into a new file called CountLines.pl in&lt;br&gt;
&amp;gt; Matlab's path:&lt;br&gt;
&amp;gt; while (&amp;lt;&amp;gt;) {};&lt;br&gt;
&amp;gt; print $.,&quot;\n&quot;;&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Now, run it in Matlab like this:&lt;br&gt;
&amp;gt; perl('CountLines.pl',filename)&lt;br&gt;
&amp;gt; where filename is your file name.&lt;br&gt;
&lt;br&gt;
Tideman, that worked great (and is about 100 times faster than the code I had above)...until I tried to access a file from a network location (\\abc-def-45\data...).  Is there a perl command that will allow UNC file locations to be recognized?&lt;br&gt;
&lt;br&gt;
Jon</description>
    </item>
  </channel>
</rss>

