Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!news2.google.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post01.iad.highwinds-media.com!newsfe09.iad.POSTED!7564ea0f!not-for-mail
From: Walter Roberson <roberson@hushmail.com>
Organization: Canada Eat The Cookie Foundation
User-Agent: Thunderbird 2.0.0.16 (Windows/20080708)
MIME-Version: 1.0
Newsgroups: comp.soft-sys.matlab
Subject: Re: How to remove unwanted text from a .txt file?
References: <gbbqun$qo5$1@fred.mathworks.com> <ryeCk.36464$QF5.28064@newsfe08.iad> <gbbubd$2n7$1@fred.mathworks.com> <VGfCk.562$Cl1.66@newsfe01.iad> <gbc1vn$8k6$1@fred.mathworks.com> <gbc2m6$de7$1@fred.mathworks.com> <gbc4i6$qf5$1@fred.mathworks.com> <gbcjal$h84$1@fred.mathworks.com>
In-Reply-To: <gbcjal$h84$1@fred.mathworks.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 33
Message-ID: <qukCk.14892$Il.5652@newsfe09.iad>
NNTP-Posting-Host: 24.79.146.116
X-Complaints-To: internet.abuse@sjrb.ca
X-Trace: newsfe09.iad 1222235158 24.79.146.116 (Wed, 24 Sep 2008 05:45:58 UTC)
NNTP-Posting-Date: Wed, 24 Sep 2008 05:45:58 UTC
Date: Wed, 24 Sep 2008 00:46:45 -0500
Xref: news.mathworks.com comp.soft-sys.matlab:491714


Cy abd wrote:
> The following code does what I need to do but is very slow since the .txt file is about 500,000 lines and the 3rd. line after else takes up 85% of the total time! How can that line be optimized please?
> 
> // the line consuming the most time 85%.
> B(r,:) = [data1 data2 data3 data4 data5 data6 data7];

That line is taking most of the time because you are not pre-allocating the matrix,
so it is re-sizing the matrix for every line.

Two days ago, Steve Lord and I each explained some pre-allocation strategies
that can be used when the file size is not fixed. The thread was
"How to create a variable array n*2"

>     if isletter(tline(1))==1 ;
>     else
>        A = textscan(tline,'%f %f %f %f %f %f %f','delimiter',',');
>        [data1 data2 data3 data4 data5 data6 data7] = A{:};
>        B(r,:) = [data1 data2 data3 data4 data5 data6 data7];
>        r=r+1;
>     end

That can be optimized slightly to

  if ~isletter(tline(1))
    A = textscan(tline, '%f %f %f %f %f %f %f','delimiter',',');
    B(r,:) = [A{:}];
    r = r + 1;
  end

The removal of the ==1 and the elimination of the empty branch will likely
measurably speed up execution of the routine. The other change will speed
up the code measureably for sure (though with your current code, your
time for that line is being overwhelmed by the re-allocations you are doing.)