<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662</link>
    <title>MATLAB Central Newsreader - Script will take far too long..</title>
    <description>Feed for thread: Script will take far too long..</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Tue, 21 Jul 2009 19:28:02 -0400</pubDate>
      <title>Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667095</link>
      <author>David Kunik</author>
      <description>Hey,&lt;br&gt;
&lt;br&gt;
So i have a very large data set (1.9GB) so i am unable to load this in to Matlab for modification that way (this computer only has 1gb of ram).  I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is easy enough but each row of data is like 1.5MB and it takes bloody forever for regexprep to go through (there is allot of A_A's..).  By &quot;forever&quot; i mean ~41 minutes.  For 1/624 rows.&lt;br&gt;
&lt;br&gt;
I can show you some of the code i have and you can laugh all you like.  I've just started matlab for my new job so i'm just glad the code works.&lt;br&gt;
&lt;br&gt;
function readlines()&lt;br&gt;
fid = fopen('C:\Documents and Settings\xxx\xxx\Assignment 03\BreastCancerDataset_SharedData.csv','r');&lt;br&gt;
fid2 = fopen('C:\Documents and Settings\xxx\xxx\SNPDataConv.csv', 'w');&lt;br&gt;
line = fgets(fid); % Get headers first.  Yes, cheap hack.&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;for i = 1:624&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tic;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;line = fgets(fid);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;disp('Debug: Read line')&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;line = regexprep(line, 'A_A', '1');&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;disp('Debug: Replaced A_A') % These 3 parts take about 15 minutes EACH.&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;line = regexprep(line, 'A_B', '2');&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;disp('Debug: Replaced A_B')&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;line = regexprep(line, 'B_B', '3');&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;disp('Debug: Replaced B_B')&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;fwrite(fid2, line);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;toc;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;disp(line)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;end&lt;br&gt;
fclose(fid);&lt;br&gt;
fclose(fid2);&lt;br&gt;
end&lt;br&gt;
&lt;br&gt;
So, theoretically, there is nothing wrong with my code as it works as i want it to (everything is successfully replaced/outputted/written) but 40 minutes for 1 line is ridiculous.  Any help optimizing this would be greatly appreciated.</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 19:41:02 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667101</link>
      <author>Shanmugam Kannappan</author>
      <description>&quot;David Kunik&quot; &amp;lt;kunik@ualberta.ca&amp;gt; wrote in message &amp;lt;h454s2$k0u$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; Hey,&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; So i have a very large data set (1.9GB) so i am unable to load this in to Matlab for modification that way (this computer only has 1gb of ram).  I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is easy enough but each row of data is like 1.5MB and it takes bloody forever for regexprep to go through (there is allot of A_A's..).  By &quot;forever&quot; i mean ~41 minutes.  For 1/624 rows.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I can show you some of the code i have and you can laugh all you like.  I've just started matlab for my new job so i'm just glad the code works.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; function readlines()&lt;br&gt;
&amp;gt; fid = fopen('C:\Documents and Settings\xxx\xxx\Assignment 03\BreastCancerDataset_SharedData.csv','r');&lt;br&gt;
&amp;gt; fid2 = fopen('C:\Documents and Settings\xxx\xxx\SNPDataConv.csv', 'w');&lt;br&gt;
&amp;gt; line = fgets(fid); % Get headers first.  Yes, cheap hack.&lt;br&gt;
&amp;gt;     for i = 1:624&lt;br&gt;
&amp;gt;         tic;&lt;br&gt;
&amp;gt;         line = fgets(fid);&lt;br&gt;
&amp;gt;                                  disp('Debug: Read line')&lt;br&gt;
&amp;gt;         line = regexprep(line, 'A_A', '1');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced A_A') % These 3 parts take about 15 minutes EACH.&lt;br&gt;
&amp;gt;         line = regexprep(line, 'A_B', '2');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced A_B')&lt;br&gt;
&amp;gt;         line = regexprep(line, 'B_B', '3');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced B_B')&lt;br&gt;
&amp;gt;         fwrite(fid2, line);&lt;br&gt;
&amp;gt;         toc;&lt;br&gt;
&amp;gt;         disp(line)&lt;br&gt;
&amp;gt;     end&lt;br&gt;
&amp;gt; fclose(fid);&lt;br&gt;
&amp;gt; fclose(fid2);&lt;br&gt;
&amp;gt; end&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; So, theoretically, there is nothing wrong with my code as it works as i want it to (everything is successfully replaced/outputted/written) but 40 minutes for 1 line is ridiculous.  Any help optimizing this would be greatly appreciated.&lt;br&gt;
&lt;br&gt;
Hi!&lt;br&gt;
&lt;br&gt;
I am not really clear with your explanation but&lt;br&gt;
from the code it seems like replacing something &amp; writing it to other file,&lt;br&gt;
why dont you try fread instead of fgets.&lt;br&gt;
fread will read all the strings from the file to a single variable &amp; replace using regexprep.....&lt;br&gt;
&lt;br&gt;
Shan....</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 20:14:02 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667115</link>
      <author>David Kunik</author>
      <description>&quot;Shanmugam Kannappan&quot; &amp;lt;shanmugambe@gmail.com&amp;gt; wrote in message &amp;lt;h455ke$ab2$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; &quot;David Kunik&quot; &amp;lt;kunik@ualberta.ca&amp;gt; wrote in message &amp;lt;h454s2$k0u$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; Hey,&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; So i have a very large data set (1.9GB) so i am unable to load this in to Matlab for modification that way (this computer only has 1gb of ram).  I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is easy enough but each row of data is like 1.5MB and it takes bloody forever for regexprep to go through (there is allot of A_A's..).  By &quot;forever&quot; i mean ~41 minutes.  For 1/624 rows.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; I can show you some of the code i have and you can laugh all you like.  I've just started matlab for my new job so i'm just glad the code works.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; function readlines()&lt;br&gt;
&amp;gt; &amp;gt; fid = fopen('C:\Documents and Settings\xxx\xxx\Assignment 03\BreastCancerDataset_SharedData.csv','r');&lt;br&gt;
&amp;gt; &amp;gt; fid2 = fopen('C:\Documents and Settings\xxx\xxx\SNPDataConv.csv', 'w');&lt;br&gt;
&amp;gt; &amp;gt; line = fgets(fid); % Get headers first.  Yes, cheap hack.&lt;br&gt;
&amp;gt; &amp;gt;     for i = 1:624&lt;br&gt;
&amp;gt; &amp;gt;         tic;&lt;br&gt;
&amp;gt; &amp;gt;         line = fgets(fid);&lt;br&gt;
&amp;gt; &amp;gt;                                  disp('Debug: Read line')&lt;br&gt;
&amp;gt; &amp;gt;         line = regexprep(line, 'A_A', '1');&lt;br&gt;
&amp;gt; &amp;gt;                                  disp('Debug: Replaced A_A') % These 3 parts take about 15 minutes EACH.&lt;br&gt;
&amp;gt; &amp;gt;         line = regexprep(line, 'A_B', '2');&lt;br&gt;
&amp;gt; &amp;gt;                                  disp('Debug: Replaced A_B')&lt;br&gt;
&amp;gt; &amp;gt;         line = regexprep(line, 'B_B', '3');&lt;br&gt;
&amp;gt; &amp;gt;                                  disp('Debug: Replaced B_B')&lt;br&gt;
&amp;gt; &amp;gt;         fwrite(fid2, line);&lt;br&gt;
&amp;gt; &amp;gt;         toc;&lt;br&gt;
&amp;gt; &amp;gt;         disp(line)&lt;br&gt;
&amp;gt; &amp;gt;     end&lt;br&gt;
&amp;gt; &amp;gt; fclose(fid);&lt;br&gt;
&amp;gt; &amp;gt; fclose(fid2);&lt;br&gt;
&amp;gt; &amp;gt; end&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; So, theoretically, there is nothing wrong with my code as it works as i want it to (everything is successfully replaced/outputted/written) but 40 minutes for 1 line is ridiculous.  Any help optimizing this would be greatly appreciated.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Hi!&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I am not really clear with your explanation but&lt;br&gt;
&amp;gt; from the code it seems like replacing something &amp; writing it to other file,&lt;br&gt;
&amp;gt; why dont you try fread instead of fgets.&lt;br&gt;
&amp;gt; fread will read all the strings from the file to a single variable &amp; replace using regexprep.....&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Shan....&lt;br&gt;
Good idea, however loading 2GB of data in to fread does not help me that much as this computer doesn't have enough memory.  Obviously it's time for a computer upgrade, but there must be a way to make this faster?  Maybe i just need to accept that this is how long it takes to modify 1.5mb text strings on the fly..</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 20:30:01 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667120</link>
      <author>Ashish Uthama</author>
      <description>On Tue, 21 Jul 2009 15:28:02 -0400, David Kunik &amp;lt;kunik@ualberta.ca&amp;gt; wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt; Hey,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; So i have a very large data set (1.9GB) so i am unable to load this in  &lt;br&gt;
&amp;gt; to Matlab for modification that way (this computer only has 1gb of  &lt;br&gt;
&amp;gt; ram).  I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is  &lt;br&gt;
&amp;gt; easy enough but each row of data is like 1.5MB and it takes bloody  &lt;br&gt;
&amp;gt; forever for regexprep to go through (there is allot of A_A's..).  By  &lt;br&gt;
&amp;gt; &quot;forever&quot; i mean ~41 minutes.  For 1/624 rows.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I can show you some of the code i have and you can laugh all you like.   &lt;br&gt;
&amp;gt; I've just started matlab for my new job so i'm just glad the code works.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; function readlines()&lt;br&gt;
&amp;gt; fid = fopen('C:\Documents and Settings\xxx\xxx\Assignment  &lt;br&gt;
&amp;gt; 03\BreastCancerDataset_SharedData.csv','r');&lt;br&gt;
&amp;gt; fid2 = fopen('C:\Documents and Settings\xxx\xxx\SNPDataConv.csv', 'w');&lt;br&gt;
&amp;gt; line = fgets(fid); % Get headers first.  Yes, cheap hack.&lt;br&gt;
&amp;gt;     for i = 1:624&lt;br&gt;
&amp;gt;         tic;&lt;br&gt;
&amp;gt;         line = fgets(fid);&lt;br&gt;
&amp;gt;                                  disp('Debug: Read line')&lt;br&gt;
&amp;gt;         line = regexprep(line, 'A_A', '1');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced A_A') % These 3  &lt;br&gt;
&amp;gt; parts take about 15 minutes EACH.&lt;br&gt;
&amp;gt;         line = regexprep(line, 'A_B', '2');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced A_B')&lt;br&gt;
&amp;gt;         line = regexprep(line, 'B_B', '3');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced B_B')&lt;br&gt;
&amp;gt;         fwrite(fid2, line);&lt;br&gt;
&amp;gt;         toc;&lt;br&gt;
&amp;gt;         disp(line)&lt;br&gt;
&amp;gt;     end&lt;br&gt;
&amp;gt; fclose(fid);&lt;br&gt;
&amp;gt; fclose(fid2);&lt;br&gt;
&amp;gt; end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; So, theoretically, there is nothing wrong with my code as it works as i  &lt;br&gt;
&amp;gt; want it to (everything is successfully replaced/outputted/written) but  &lt;br&gt;
&amp;gt; 40 minutes for 1 line is ridiculous.  Any help optimizing this would be  &lt;br&gt;
&amp;gt; greatly appreciated.&lt;br&gt;
&lt;br&gt;
For one, are you sure you dont need the 'rt' mode in FOPEN?&lt;br&gt;
See the help on FGETS:&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;FGETS is intended for use with files that contain newline characters.&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Given a file with no newline characters, FGETS may take a long time to&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;execute.&lt;br&gt;
&lt;br&gt;
So if you open in 'r' (binary mode) it might not recognize the newline and  &lt;br&gt;
read in the full file anyway. (in which case you doing the full file in  &lt;br&gt;
*each* loop iteration). A easy way to check this would be to single step  &lt;br&gt;
(debug) your code and check the size of 'line'.</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 20:30:19 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667121</link>
      <author>Alan B</author>
      <description>&quot;David Kunik&quot; &amp;lt;kunik@ualberta.ca&amp;gt; wrote in message &amp;lt;h457ia$edm$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; &quot;Shanmugam Kannappan&quot; &amp;lt;shanmugambe@gmail.com&amp;gt; wrote in message &amp;lt;h455ke$ab2$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; &quot;David Kunik&quot; &amp;lt;kunik@ualberta.ca&amp;gt; wrote in message &amp;lt;h454s2$k0u$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Hey,&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; So i have a very large data set (1.9GB) so i am unable to load this in to Matlab for modification that way (this computer only has 1gb of ram).  I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is easy enough but each row of data is like 1.5MB and it takes bloody forever for regexprep to go through (there is allot of A_A's..).  By &quot;forever&quot; i mean ~41 minutes.  For 1/624 rows.&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I can show you some of the code i have and you can laugh all you like.  I've just started matlab for my new job so i'm just glad the code works.&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; function readlines()&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; fid = fopen('C:\Documents and Settings\xxx\xxx\Assignment 03\BreastCancerDataset_SharedData.csv','r');&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; fid2 = fopen('C:\Documents and Settings\xxx\xxx\SNPDataConv.csv', 'w');&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; line = fgets(fid); % Get headers first.  Yes, cheap hack.&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;     for i = 1:624&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         tic;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         line = fgets(fid);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;                                  disp('Debug: Read line')&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         line = regexprep(line, 'A_A', '1');&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;                                  disp('Debug: Replaced A_A') % These 3 parts take about 15 minutes EACH.&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         line = regexprep(line, 'A_B', '2');&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;                                  disp('Debug: Replaced A_B')&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         line = regexprep(line, 'B_B', '3');&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;                                  disp('Debug: Replaced B_B')&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         fwrite(fid2, line);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         toc;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;         disp(line)&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;     end&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; fclose(fid);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; fclose(fid2);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; end&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; So, theoretically, there is nothing wrong with my code as it works as i want it to (everything is successfully replaced/outputted/written) but 40 minutes for 1 line is ridiculous.  Any help optimizing this would be greatly appreciated.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; Hi!&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; I am not really clear with your explanation but&lt;br&gt;
&amp;gt; &amp;gt; from the code it seems like replacing something &amp; writing it to other file,&lt;br&gt;
&amp;gt; &amp;gt; why dont you try fread instead of fgets.&lt;br&gt;
&amp;gt; &amp;gt; fread will read all the strings from the file to a single variable &amp; replace using regexprep.....&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; Shan....&lt;br&gt;
&amp;gt; Good idea, however loading 2GB of data in to fread does not help me that much as this computer doesn't have enough memory.  Obviously it's time for a computer upgrade, but there must be a way to make this faster?  Maybe i just need to accept that this is how long it takes to modify 1.5mb text strings on the fly..&lt;br&gt;
&lt;br&gt;
strrep might be faster than regexprep, unless regexprep is doing a check for trivial cases. I'm not sure how much that would help.</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 20:47:23 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667128</link>
      <author>Ashish Uthama</author>
      <description>On Tue, 21 Jul 2009 15:28:02 -0400, David Kunik &amp;lt;kunik@ualberta.ca&amp;gt; wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt; Hey,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; So i have a very large data set (1.9GB) so i am unable to load this in  &lt;br&gt;
&amp;gt; to Matlab for modification that way (this computer only has 1gb of  &lt;br&gt;
&amp;gt; ram).  I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is  &lt;br&gt;
&amp;gt; easy enough but each row of data is like 1.5MB and it takes bloody  &lt;br&gt;
&amp;gt; forever for regexprep to go through (there is allot of A_A's..).  By  &lt;br&gt;
&amp;gt; &quot;forever&quot; i mean ~41 minutes.  For 1/624 rows.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I can show you some of the code i have and you can laugh all you like.   &lt;br&gt;
&amp;gt; I've just started matlab for my new job so i'm just glad the code works.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; function readlines()&lt;br&gt;
&amp;gt; fid = fopen('C:\Documents and Settings\xxx\xxx\Assignment  &lt;br&gt;
&amp;gt; 03\BreastCancerDataset_SharedData.csv','r');&lt;br&gt;
&amp;gt; fid2 = fopen('C:\Documents and Settings\xxx\xxx\SNPDataConv.csv', 'w');&lt;br&gt;
&amp;gt; line = fgets(fid); % Get headers first.  Yes, cheap hack.&lt;br&gt;
&amp;gt;     for i = 1:624&lt;br&gt;
&amp;gt;         tic;&lt;br&gt;
&amp;gt;         line = fgets(fid);&lt;br&gt;
&amp;gt;                                  disp('Debug: Read line')&lt;br&gt;
&amp;gt;         line = regexprep(line, 'A_A', '1');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced A_A') % These 3  &lt;br&gt;
&amp;gt; parts take about 15 minutes EACH.&lt;br&gt;
&amp;gt;         line = regexprep(line, 'A_B', '2');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced A_B')&lt;br&gt;
&amp;gt;         line = regexprep(line, 'B_B', '3');&lt;br&gt;
&amp;gt;                                  disp('Debug: Replaced B_B')&lt;br&gt;
&amp;gt;         fwrite(fid2, line);&lt;br&gt;
&amp;gt;         toc;&lt;br&gt;
&amp;gt;         disp(line)&lt;br&gt;
&amp;gt;     end&lt;br&gt;
&amp;gt; fclose(fid);&lt;br&gt;
&amp;gt; fclose(fid2);&lt;br&gt;
&amp;gt; end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; So, theoretically, there is nothing wrong with my code as it works as i  &lt;br&gt;
&amp;gt; want it to (everything is successfully replaced/outputted/written) but  &lt;br&gt;
&amp;gt; 40 minutes for 1 line is ridiculous.  Any help optimizing this would be  &lt;br&gt;
&amp;gt; greatly appreciated.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Just for kicks, give this a try. I would be curious to know its  &lt;br&gt;
performance.&lt;br&gt;
&lt;br&gt;
Copy the text below into a text file called 'replace.pl'.&lt;br&gt;
In MATLAB, ensure that replace.pl is on the MATLAB path, and invoke it as  &lt;br&gt;
shown:&lt;br&gt;
&lt;br&gt;
--perl code below--&lt;br&gt;
&lt;br&gt;
#Syntax: perl('replace.pl','input.csv','output.csv');&lt;br&gt;
&lt;br&gt;
#Given an input and an output file&lt;br&gt;
#replace A_A with 1, A_B with 2 and B_B with 3.&lt;br&gt;
&lt;br&gt;
$inFile =shift @ARGV;&lt;br&gt;
$outFile=shift @ARGV;&lt;br&gt;
&lt;br&gt;
open(IFILE,$inFile) or die &quot;Could not open input file&quot;;&lt;br&gt;
open(OFILE,&quot;&amp;gt;$outFile&quot;) or die &quot;Could not open output file&quot;;&lt;br&gt;
&lt;br&gt;
while(&amp;lt;IFILE&amp;gt;){&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s/A_A/1/g;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s/A_B/2/g;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;s/B_B/3/g;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;print OFILE;&lt;br&gt;
}&lt;br&gt;
&lt;br&gt;
close(IFILE);&lt;br&gt;
close(OFILE);</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 23:08:02 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667165</link>
      <author>Jan Simon</author>
      <description>Dear David Kunik!&lt;br&gt;
&lt;br&gt;
As Alan wrote already, STRREP is much faster that REGEXPREP.&lt;br&gt;
In some tests with 1.5MB strings and 6000 occurrences of 'A_A', STRREP takes less than 0.1 sec, while REGEXPREP needs 36 sec.&lt;br&gt;
&lt;br&gt;
I'm interested in time measurements of the perl method also!&lt;br&gt;
&lt;br&gt;
Good luck, Jan</description>
    </item>
    <item>
      <pubDate>Tue, 21 Jul 2009 23:56:19 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667179</link>
      <author>Rune Allnor</author>
      <description>On 21 Jul, 21:28, &quot;David Kunik&quot; &amp;lt;ku...@ualberta.ca&amp;gt; wrote:&lt;br&gt;
&amp;gt; Hey,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; So i have a very large data set (1.9GB) so i am unable to load this in to Matlab for modification that way (this computer only has 1gb of ram). &#160;I need to do a couple simple regexprep calls (A_A -&amp;gt; 1) which is easy enough but each row of data is like 1.5MB and it takes bloody forever for regexprep to go through (there is allot of A_A's..). &#160;By &quot;forever&quot; i mean ~41 minutes. &#160;For 1/624 rows.&lt;br&gt;
&lt;br&gt;
There is something wrong. A 1GB computer should have no problems&lt;br&gt;
whatsoever with handling lines of 1.5 MB.&lt;br&gt;
&lt;br&gt;
Most likely, the file was generated by a different type of&lt;br&gt;
computer than your - presumably - PC. If so, the lines are&lt;br&gt;
ended by different characters than FGETS or FGETL look for.&lt;br&gt;
&lt;br&gt;
I don't know how to configure FGETS or FGETL to change&lt;br&gt;
End-of-Line characters, so the second best is if you know&lt;br&gt;
how many characters there are in the line.&lt;br&gt;
&lt;br&gt;
Or use some other computer and re-format the file.&lt;br&gt;
If this is a text file, open it in MSWordPad and&lt;br&gt;
then store it as a .txt file. That way, End-of-Line&lt;br&gt;
characters are changed to what matlab can recognize.&lt;br&gt;
&lt;br&gt;
Rune</description>
    </item>
    <item>
      <pubDate>Wed, 22 Jul 2009 00:24:01 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667181</link>
      <author>Jan Simon</author>
      <description>Dear Rune Allnor!&lt;br&gt;
&lt;br&gt;
&amp;gt; There is something wrong. A 1GB computer should have no problems&lt;br&gt;
&amp;gt; whatsoever with handling lines of 1.5 MB.&lt;br&gt;
&lt;br&gt;
Waiting 40 min for regexprep is not a &quot;problem&quot;, but just slow.&lt;br&gt;
Matlab 6.5 takes 10 min for replacing 125.000 'A_A' with '1' in a 1.5MB string on my 1500MHz PentiumM -- without file access!&lt;br&gt;
&lt;br&gt;
&amp;gt; Most likely, the file was generated by a different type of&lt;br&gt;
&amp;gt; computer than your - presumably - PC. If so, the lines are&lt;br&gt;
&amp;gt; ended by different characters than FGETS or FGETL look for.&lt;br&gt;
&lt;br&gt;
FGETL calls FGETS, and the later can handle all PC/MacOS9/Unix linebreaks without problems. Even FOPEN(RB) or (RT) does not matter, because FGETL cuts off the line break (of any style), FGETS had found.&lt;br&gt;
Therefore the text file do not need a conversion.&lt;br&gt;
&lt;br&gt;
Example:&lt;br&gt;
fid = fopen('test.txt', 'wb');&lt;br&gt;
fwrite(fid, ['Line1', 10, 'Line2', 13, 10, 'Line3', 13, 'END'], 'uchar');&lt;br&gt;
fclose(fid);&lt;br&gt;
fid = fopen(test.txt', 'rb')&lt;br&gt;
fgetl(fid), fgetl(fid), fgetl(fid)&lt;br&gt;
fclose(fid);&lt;br&gt;
fid = fopen(test.txt', 'rt')&lt;br&gt;
fgetl(fid), fgetl(fid), fgetl(fid)&lt;br&gt;
fclose(fid);&lt;br&gt;
&lt;br&gt;
So REGEXPREP -&amp;gt; STRREP or the nice perl trick should give enough speed.&lt;br&gt;
&lt;br&gt;
Good night, Jan</description>
    </item>
    <item>
      <pubDate>Wed, 22 Jul 2009 15:04:02 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667397</link>
      <author>David Kunik</author>
      <description>Thank you all for your help.  My script is scooting along as we speak, using strrep.  I have not been able to test the perl script on my data as this computer does not have perl installed and it is not mine.&lt;br&gt;
&lt;br&gt;
Thanks again,&lt;br&gt;
Dave.</description>
    </item>
    <item>
      <pubDate>Wed, 22 Jul 2009 16:04:37 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667412</link>
      <author>Rune Allnor</author>
      <description>On 22 Jul, 02:24, &quot;Jan Simon&quot; &amp;lt;matlab.THIS_Y...@nMINUSsimon.de&amp;gt; wrote:&lt;br&gt;
&amp;gt; Dear Rune Allnor!&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; There is something wrong. A 1GB computer should have no problems&lt;br&gt;
&amp;gt; &amp;gt; whatsoever with handling lines of 1.5 MB.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Waiting 40 min for regexprep is not a &quot;problem&quot;, but just slow.&lt;br&gt;
&lt;br&gt;
The time is a problem for several reasons:&lt;br&gt;
&lt;br&gt;
1) It takes several orders of magnitudes more&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;than it needs to (see below)&lt;br&gt;
2) The time is the difference between the job&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;getting done at all, or not.&lt;br&gt;
&lt;br&gt;
&amp;gt; Matlab 6.5 takes 10 min for replacing 125.000 'A_A' with '1' in a 1.5MB string on my 1500MHz PentiumM -- without file access!&lt;br&gt;
&lt;br&gt;
On R2006a:&lt;br&gt;
&lt;br&gt;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&lt;br&gt;
N = 1500000;&lt;br&gt;
s = char('B'*ones(1,N));&lt;br&gt;
Naa = 200000;&lt;br&gt;
&lt;br&gt;
for n=1:5:5*Naa&lt;br&gt;
s(n:n+2)='A_A';&lt;br&gt;
end&lt;br&gt;
&lt;br&gt;
rexp = 'A_A';&lt;br&gt;
tic&lt;br&gt;
regexprep(s,rexp,'1');&lt;br&gt;
toc&lt;br&gt;
&lt;br&gt;
tic&lt;br&gt;
strrep(s,'A_A','1');&lt;br&gt;
toc&lt;br&gt;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&lt;br&gt;
&lt;br&gt;
Elapsed time is 65.978407 seconds.&lt;br&gt;
Elapsed time is 0.018985 seconds.&lt;br&gt;
&lt;br&gt;
So the OP should be able to do the whole job in a&lt;br&gt;
couple of seconds. With his present computer.&lt;br&gt;
&lt;br&gt;
Rune</description>
    </item>
    <item>
      <pubDate>Wed, 22 Jul 2009 16:37:18 -0400</pubDate>
      <title>Re: Script will take far too long..</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/256662#667425</link>
      <author>Ashish Uthama</author>
      <description>On Wed, 22 Jul 2009 11:04:02 -0400, David Kunik &amp;lt;kunik@ualberta.ca&amp;gt; wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt; Thank you all for your help.  My script is scooting along as we speak,  &lt;br&gt;
&amp;gt; using strrep.  I have not been able to test the perl script on my data  &lt;br&gt;
&amp;gt; as this computer does not have perl installed and it is not mine.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Thanks again,&lt;br&gt;
&amp;gt; Dave.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
STRREP should help.&lt;br&gt;
&lt;br&gt;
Side note: You dont need to install Perl, MATLAB comes with it!</description>
    </item>
  </channel>
</rss>

