<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/164887</link>
    <title>MATLAB Central Newsreader - processing extremely long data file sequentially?</title>
    <description>Feed for thread: processing extremely long data file sequentially?</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Sat, 01 Mar 2008 01:54:05 -0500</pubDate>
      <title>processing extremely long data file sequentially?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/164887#418383</link>
      <author>huhua</author>
      <description>Hi all,&lt;br&gt;
&lt;br&gt;
Let's say a CSV file has tens of millions lines and each line has many &lt;br&gt;
columns.&lt;br&gt;
&lt;br&gt;
I actually wanted to browse through it line by line (except the first line, &lt;br&gt;
which is the headline),&lt;br&gt;
&lt;br&gt;
and I need to cut most of the lines and columns out, and only use a few &lt;br&gt;
lines and columns.&lt;br&gt;
&lt;br&gt;
I am estimating that out of these tens of millions of lines, I only need to &lt;br&gt;
retain tens of thousands of lines.&lt;br&gt;
&lt;br&gt;
But I need to process them and cut the non-useful lines out.&lt;br&gt;
&lt;br&gt;
Even Excel 2007 refused to load the file. Matlab crashed several times when &lt;br&gt;
I tried to load.&lt;br&gt;
&lt;br&gt;
What do I do?&lt;br&gt;
&lt;br&gt;
Is there a &quot;textread&quot;, &quot;textscan&quot;, &quot;csvread&quot; file that can read it line by &lt;br&gt;
line and sequentially?&lt;br&gt;
&lt;br&gt;
I think it is important for the program to keep a relative pointer in the &lt;br&gt;
CSV file so that after each line is read and processed, we can move to the &lt;br&gt;
next line.&lt;br&gt;
&lt;br&gt;
And I just need to sequentially write out another output file to take the &lt;br&gt;
filtered lines.&lt;br&gt;
&lt;br&gt;
Of course the benefit of &quot;textread&quot;, &quot;textscan&quot;, &quot;csvread&quot; is that they can &lt;br&gt;
parse formated strings, including both text and numbers... that's &lt;br&gt;
important...&lt;br&gt;
&lt;br&gt;
Any ideas?&lt;br&gt;
&lt;br&gt;
Thanks</description>
    </item>
    <item>
      <pubDate>Sat, 01 Mar 2008 03:30:20 -0500</pubDate>
      <title>Re: processing extremely long data file sequentially?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/164887#418387</link>
      <author>Paul </author>
      <description>&quot;huhua&quot; &amp;lt;lunamoonmoon@gmail.com&amp;gt; wrote in message&lt;br&gt;
&amp;lt;fqacv8$et8$1@news.Stanford.EDU&amp;gt;...&lt;br&gt;
&amp;gt; Hi all,&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Let's say a CSV file has tens of millions lines and each&lt;br&gt;
line has many &lt;br&gt;
&amp;gt; columns.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I actually wanted to browse through it line by line&lt;br&gt;
(except the first line, &lt;br&gt;
&amp;gt; which is the headline),&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; and I need to cut most of the lines and columns out, and&lt;br&gt;
only use a few &lt;br&gt;
&amp;gt; lines and columns.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I am estimating that out of these tens of millions of&lt;br&gt;
lines, I only need to &lt;br&gt;
&amp;gt; retain tens of thousands of lines.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; But I need to process them and cut the non-useful lines out.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Even Excel 2007 refused to load the file. Matlab crashed&lt;br&gt;
several times when &lt;br&gt;
&amp;gt; I tried to load.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; What do I do?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Is there a &quot;textread&quot;, &quot;textscan&quot;, &quot;csvread&quot; file that can&lt;br&gt;
read it line by &lt;br&gt;
&amp;gt; line and sequentially?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I think it is important for the program to keep a relative&lt;br&gt;
pointer in the &lt;br&gt;
&amp;gt; CSV file so that after each line is read and processed, we&lt;br&gt;
can move to the &lt;br&gt;
&amp;gt; next line.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; And I just need to sequentially write out another output&lt;br&gt;
file to take the &lt;br&gt;
&amp;gt; filtered lines.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Of course the benefit of &quot;textread&quot;, &quot;textscan&quot;, &quot;csvread&quot;&lt;br&gt;
is that they can &lt;br&gt;
&amp;gt; parse formated strings, including both text and numbers...&lt;br&gt;
that's &lt;br&gt;
&amp;gt; important...&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Any ideas?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Thanks&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&lt;br&gt;
help fgetl</description>
    </item>
    <item>
      <pubDate>Sun, 09 Mar 2008 15:11:03 -0400</pubDate>
      <title>Re: processing extremely long data file sequentially?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/164887#419858</link>
      <author>Andres Toennesmann</author>
      <description>&quot;huhua&quot; &amp;lt;lunamoonmoon@gmail.com&amp;gt; wrote in message&lt;br&gt;
&amp;lt;fqacv8$et8$1@news.Stanford.EDU&amp;gt;...&lt;br&gt;
&amp;gt; Hi all,&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Let's say a CSV file has tens of millions lines and each&lt;br&gt;
line has many &lt;br&gt;
&amp;gt; columns.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I actually wanted to browse through it line by line&lt;br&gt;
(except the first line, &lt;br&gt;
&amp;gt; which is the headline),&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; and I need to cut most of the lines and columns out, and&lt;br&gt;
only use a few &lt;br&gt;
&amp;gt; lines and columns.&lt;br&gt;
&lt;br&gt;
&amp;gt; []&lt;br&gt;
&lt;br&gt;
If the csv contains mainly numeric data below the header&lt;br&gt;
line, you may try txt2mat from the file exchange with its&lt;br&gt;
'RowRange' and 'FilePos' arguments (see Help, esp. Example&lt;br&gt;
5). This should be vastly quicker than fgetl.&lt;br&gt;
Hth&lt;br&gt;
Andres</description>
    </item>
    <item>
      <pubDate>Mon, 10 Mar 2008 04:43:40 -0400</pubDate>
      <title>Re: processing extremely long data file sequentially?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/164887#419938</link>
      <author>NZTideMan</author>
      <description>On Mar 10, 4:11=A0am, &quot;Andres Toennesmann&quot; &amp;lt;rant...@werb.de&amp;gt; wrote:&lt;br&gt;
&amp;gt; &quot;huhua&quot; &amp;lt;lunamoonm...@gmail.com&amp;gt; wrote in message&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;lt;fqacv8$et...@news.Stanford.EDU&amp;gt;...&amp;gt; Hi all,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Let's say a CSV file has tens of millions lines and each&lt;br&gt;
&amp;gt; line has many&lt;br&gt;
&amp;gt; &amp;gt; columns.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I actually wanted to browse through it line by line&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; (except the first line,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; which is the headline),&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; and I need to cut most of the lines and columns out, and&lt;br&gt;
&amp;gt; only use a few&lt;br&gt;
&amp;gt; &amp;gt; lines and columns.&lt;br&gt;
&amp;gt; &amp;gt; []&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; If the csv contains mainly numeric data below the header&lt;br&gt;
&amp;gt; line, you may try txt2mat from the file exchange with its&lt;br&gt;
&amp;gt; 'RowRange' and 'FilePos' arguments (see Help, esp. Example&lt;br&gt;
&amp;gt; 5). This should be vastly quicker than fgetl.&lt;br&gt;
&amp;gt; Hth&lt;br&gt;
&amp;gt; Andres&lt;br&gt;
&lt;br&gt;
I'd use Fortran, not Matlab for this job.&lt;br&gt;
Fortran was developed back in the days of Hollerith cards, in which&lt;br&gt;
you loaded one card of data at a time, so it can handle such a problem&lt;br&gt;
easily and very fast.</description>
    </item>
  </channel>
</rss>

