Thread Subject: read ascii file

Subject: read ascii file

From: DS

Date: 10 Nov, 2008 09:53:02

Message: 1 of 13

Hello all. I need to read in an ascii file, with mixed char and numeric data, and I'm reading fairly big files so I would like it to be fast. The files look something like this:

{{0,0,0, ... lots of numbers ... ,0,0,0},{0,0,0, ... lots of numbers ... ,0,0,0}, etc }

It's a square matrix all on a single line, comma delimited, with each row and the entire matrix enclosed in curly braces.

I can manage a for-loop "hack all the braces out" approach, but is there a better way for something with this simple format?

-DS

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 12:08:02

Message: 2 of 13

"DS" <null@null.com> wrote in message <gf909u$4n9$1@fred.mathworks.com>...
> [..] The files look something like this:
>
> {{0,0,0, ... lots of numbers ... ,0,0,0},{0,0,0, ... lots of numbers ... ,0,0,0}, etc }
>
> It's a square matrix all on a single line, comma delimited, with each row and the entire matrix enclosed in curly braces.
[..]
.
.
Hi,
if the curly braces are your only char data, i.e. inside the braces there are just numbers, you could do the trick with txt2mat (file exchange):
.
A = txt2mat('file.txt',...
    'ReplaceExpr',{{'},{',char([13 10])}},...
    'ReplaceChar',{'{}, '});
.
I assumed that
- you don't know the size of the matrix before (which would help to speed things up)
- the rows are reliably separated by '},{'
.
I checked this on a sample file containing the only line
{{1,2,3,4},{5,6,7,8},{9,10,11,12},{13,14,15,16}}
Of course,
.
B = txt2mat('file.txt','ReplaceChar',{'{}, '});
n = sqrt(numel(B));
B = reshape(B,n,n).';
.
would work as well.
Ok, this is kind of hacking the braces out, but it should be quite fast.
Hth
Andres

Subject: read ascii file

From: Rune Allnor

Date: 10 Nov, 2008 12:12:50

Message: 3 of 13

On 10 Nov, 10:53, "DS" <n...@null.com> wrote:
> Hello all. =A0I need to read in an ascii file, with mixed char and numeri=
c data, and I'm reading fairly big files so I would like it to be fast.

"Text data" and "fast access" are contradictions in terms.
Expect 2-5s delay per 10 MByte of text data in the file.


>=A0The files look something like this:
>
> {{0,0,0, ... lots of numbers ... ,0,0,0},{0,0,0, ... lots of numbers ... =
,0,0,0}, etc }
>
> It's a square matrix all on a single line, comma delimited, with each row=
 and the entire matrix enclosed in curly braces.
>
> I can manage a for-loop "hack all the braces out" approach, but is there =
a better way for something with this simple format?

Regular expressions is the obvious first try.

Rune

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 12:57:02

Message: 4 of 13

Rune Allnor <allnor@tele.ntnu.no> wrote in message <e2d1e726-82c5-4d32-865d-ab0700e0f092@r36g2000prf.googlegroups.com>...

> Regular expressions is the obvious first try.
>
> Rune
.
Hi Rune,
imho regular expressions are quite slow, and this could be noticeable for large files. If I had the choice, I'd just replace the braces with spaces.
Regards
Andres

Subject: read ascii file

From: Rune Allnor

Date: 10 Nov, 2008 13:44:03

Message: 5 of 13

On 10 Nov, 13:57, "Andres" <rant...@werb.deNoRs> wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message <e2d1e726-82c5-4d32-865d-ab0700e0f...@r36g2000prf.googlegroups.com>...
> > Regular expressions is the obvious first try.
>
> > Rune
>
> .
> Hi Rune,
> imho regular expressions are quite slow, and this could be noticeable for large files. If I had the choice, I'd just replace the braces with spaces.

The TXT2MAT function you suggested earlier uses
a syntax which is deceptively similar to a regular
expression. I can't find any documentation for the
function, though, so I don't know how it is implemented.

As for text files, it's very time-consuming to mess
with them. The only *real* time saving is to use a
binary format. This was discussed here not too long
ago:

http://groups.google.no/group/comp.soft-sys.matlab/msg/d49639538f61a0dc?hl=no

Rune

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 13:53:01

Message: 6 of 13

on the speed...
.
for a 1000x1000 matrix file counting from 1 to 1e6 ('{{1,2,3,...', ~6.7Mb),
.
tic
B = txt2mat('ds_1000.txt',0,-1,'ReplaceChar',{'{}, '});
n = sqrt(numel(B));
B = reshape(B,n,n).';
toc
.
takes about one second. (The "0,-1" args switch off the file layout detection which is necessary for lines >64kB)

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 14:03:02

Message: 7 of 13

Rune Allnor <allnor@tele.ntnu.no> wrote in message <008a086e-9fd0-45d6-b4ef-b3aef5c7755d@a17g2000prm.googlegroups.com>...
>
> The TXT2MAT function you suggested earlier uses
> a syntax which is deceptively similar to a regular
> expression. I can't find any documentation for the
> function, though, so I don't know how it is implemented.
.
there's quite a lengthy doc
.
> As for text files, it's very time-consuming to mess
> with them. The only *real* time saving is to use a
> binary format. [..]
.
I fully agree. But often enough, you don't have any choice of the format of the data that is given to you.
.
(sorry for the "."-lines - empty lines are not displayed here)

Subject: read ascii file

From: Rune Allnor

Date: 10 Nov, 2008 14:19:13

Message: 8 of 13

On 10 Nov, 15:03, "Andres" <rant...@werb.deNoRs> wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message <008a086e-9fd0-45d6-b4ef-b3aef5c77...@a17g2000prm.googlegroups.com>...
>
> > The TXT2MAT function you suggested earlier uses
> > a syntax which is deceptively similar to a regular
> > expression. I can't find any documentation for the
> > function, though, so I don't know how it is implemented.
>
> .
> there's quite a lengthy doc

Where? I can't find it in my R2006a release, and
I can't find it among the mathworks list of functions.

> I fully agree. But often enough, you don't have any choice of the format of the data that is given to you.

Fair enough. My point is: Don't complain about speed
when you deal with text files. If speed really is a
concern, use a binary format. If text files is what you
have, don't discard regular expressions on account
of speed.

Rune

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 15:08:03

Message: 9 of 13

Rune Allnor <allnor@tele.ntnu.no> wrote in message <4077b49b-c131-4bdf-8401-2ba9e5698b39@a17g2000prm.googlegroups.com>...
[..]
> Where? I can't find it in my R2006a release, and
> I can't find it among the mathworks list of functions.
>
As I noted, it can be found on the file exchange. I'm the author.
.
> [..] If text files is what you
> have, don't discard regular expressions on account
> of speed.
.
I don't want to discard them in general, I just thought they are not necessary here. To my experience, the replacement process is slowed down by a factor of ~5 with regular expressions, which might be important to the OP who "would like it to be fast", e.g. if he has many files to import - regardless of how much faster a binary import would be.
Regards
Andres

Subject: read ascii file

From: DS

Date: 10 Nov, 2008 16:23:02

Message: 10 of 13

Rune and Andres -- Thank you both for the helpful input.

I got Andres' file exchange code TXT2MAT working as per the following:

B = txt2mat(file,'ReadMode','block','NumColumns',1248,'ReplaceChar',{'{}, '});
n = sqrt(numel(B));
B = reshape(B,n,n).';

I'd rather not have to throw in the magic number there (1248), but apparently the long single line gets read in as ~25 lines and the data gets all twisted when I let TXT2MAT try to figure it out. At any rate, it's faster and cleaner than my hack and slash approach:

%-----------------------------------
%read entire file as cell string
a = textread('file.txt','%s','delimiter',',');
%search for first '}' character (indicates the end of a column)
for count=1:length(a)
    if ~isempty(strfind(cell2mat(a(count)),'}'))
        break
    end
end
ncols = count;
nrows = length(a)/ncols;
%clean braces '{' and '}' from data
a = strrep(a,'{','');
a = strrep(a,'}','');
%convert cell array to char
a = char(a);
%convert string array to numeric
a = str2num(a);
%reshape matrix
a = reshape(a,ncols,nrows);
%-----------------------------------

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 20:05:06

Message: 11 of 13

If it works - it's fine.
.
But I'm a bit puzzled by the need for the magic number, too, which is not even square. Did you try my latter code which I tested on the one million numbers file?
Just if you like, contact me via the file exchange author page, I'd be curious to look into detail.
Regards
Andres

Subject: read ascii file

From: DS

Date: 10 Nov, 2008 21:22:02

Message: 12 of 13

"Andres" <rantore@werb.deNoRs> wrote in message <gfa45i$kk7$1@fred.mathworks.com>...
 Did you try my latter code which I tested on the one million numbers file?
---
I tried your latter code, and I have the same trouble. I'm sure it would work fine if the data were well formatted; the data is a continuous block of characters with no line-feeds to delimit the rows. I think this is giving TXT2MAT the wrong idea about how the data should be parsed.
.
I'll try to send you a sample file to play with if you're curious.
-DS

Subject: read ascii file

From: Andres

Date: 12 Nov, 2008 12:08:01

Message: 13 of 13

"DS" <null@null.com> wrote in message <gfa8lq$olu$1@fred.mathworks.com>...
> "Andres" <rantore@werb.deNoRs> wrote in message <gfa45i$kk7$1@fred.mathworks.com>...
> I'll try to send you a sample file to play with if you're curious.
> -DS

That would be nice, thanks. I hope you can decipher my e-mail address (leave out any 'r', end with .de). Btw. I wonder where the 'Contact Author' button in the file exchange has gone...

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
data import Andres 10 Nov, 2008 09:05:23
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com