Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Textread
Date: Wed, 1 Jul 2009 22:29:02 +0000 (UTC)
Organization: Universit&#228;t Heidelberg
Lines: 44
Message-ID: <h2gnve$d8t$1@fred.mathworks.com>
References: <h2favd$6b6$1@fred.mathworks.com> <op.uwd76lf8a5ziv5@uthamaa.dhcp.mathworks.com> <b6ceba46-a033-4fb4-a70f-967c776bf8d7@h8g2000yqm.googlegroups.com> <h2gcut$ktu$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1246487342 13597 172.30.248.37 (1 Jul 2009 22:29:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 1 Jul 2009 22:29:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 869888
Xref: news.mathworks.com comp.soft-sys.matlab:552224


Dear Jesper Lauridsen!

> f1id = fopen('data_in.dat');
> f2id = fopen('repairedfile.dat','w');
> 
> while (~feof(f1id))
>    s = getl(f1id);
>    sidx = find(s==',');
>    s(sidx) = '.';
>    fprintf(f2id,'%s\n',s);
> end
> 
> fclose(f1id);
> fclose(f2id);

Just some ideas:
  sidx=find(s==',');
  s(sidx) = '.';
can be written faster as:
  s(findstr(',')) = '.';
This is usually even faster than:
  s(s==',') = '.'

Nevertheless, FGETL and FPRINTF have a remarkable overhead. FGETS and FWRITE can be remarkably faster:
  fid1 = fopen('data_in.dat'); fid2 = fopen('repairedfile.dat','w');
  while 1
    s = fgets(fid1);
    if ischar(s) == 0, break; end  % replace FEOF
    fwrite(fid2, strrep(s, ',', '.'), 'uchar');
  end
  fclose(fid1); fclose(fid2);

But if we are on the way, we can drop the WHILE loop and replace the original file immediately:
  fid = fopen('data_in.dat', 'rb+');
  s = fread(fid, inf, 'uchar');
  fseek(fid, 0, -1);
  fwrite(fid, strrep(s, ',', '.'), 'uchar');
  fclose(fid);
The 'b' mode of FOPEN is needed to keep the original line breaks.

Here a surprising problem can appear in FSEEK: For effective multithreading, the operating system can stop FSEEK before it reaches the desired location. This happens more likely on heavy system load and slow (network-) drives, but it is really rare and never reproducible. (NOTE: This is not a Matlab problem.)
Therefore it is safer to check the reply of FSEEK (-1 on failure) or FCLOSE and FOPEN the file again.

Good luck, Jan