Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
read ascii file

Subject: read ascii file

From: DS

Date: 10 Nov, 2008 09:53:02

Message: 1 of 13

Hello all. I need to read in an ascii file, with mixed char and numeric data, and I'm reading fairly big files so I would like it to be fast. The files look something like this:

{{0,0,0, ... lots of numbers ... ,0,0,0},{0,0,0, ... lots of numbers ... ,0,0,0}, etc }

It's a square matrix all on a single line, comma delimited, with each row and the entire matrix enclosed in curly braces.

I can manage a for-loop "hack all the braces out" approach, but is there a better way for something with this simple format?

-DS

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 12:08:02

Message: 2 of 13

"DS" <null@null.com> wrote in message <gf909u$4n9$1@fred.mathworks.com>...
> [..] The files look something like this:
>
> {{0,0,0, ... lots of numbers ... ,0,0,0},{0,0,0, ... lots of numbers ... ,0,0,0}, etc }
>
> It's a square matrix all on a single line, comma delimited, with each row and the entire matrix enclosed in curly braces.
[..]
.
.
Hi,
if the curly braces are your only char data, i.e. inside the braces there are just numbers, you could do the trick with txt2mat (file exchange):
.
A = txt2mat('file.txt',...
    'ReplaceExpr',{{'},{',char([13 10])}},...
    'ReplaceChar',{'{}, '});
.
I assumed that
- you don't know the size of the matrix before (which would help to speed things up)
- the rows are reliably separated by '},{'
.
I checked this on a sample file containing the only line
{{1,2,3,4},{5,6,7,8},{9,10,11,12},{13,14,15,16}}
Of course,
.
B = txt2mat('file.txt','ReplaceChar',{'{}, '});
n = sqrt(numel(B));
B = reshape(B,n,n).';
.
would work as well.
Ok, this is kind of hacking the braces out, but it should be quite fast.
Hth
Andres

Subject: read ascii file

From: Rune Allnor

Date: 10 Nov, 2008 12:12:50

Message: 3 of 13

On 10 Nov, 10:53, "DS" <n...@null.com> wrote:
> Hello all. =A0I need to read in an ascii file, with mixed char and numeri=
c data, and I'm reading fairly big files so I would like it to be fast.

"Text data" and "fast access" are contradictions in terms.
Expect 2-5s delay per 10 MByte of text data in the file.


>=A0The files look something like this:
>
> {{0,0,0, ... lots of numbers ... ,0,0,0},{0,0,0, ... lots of numbers ... =
,0,0,0}, etc }
>
> It's a square matrix all on a single line, comma delimited, with each row=
 and the entire matrix enclosed in curly braces.
>
> I can manage a for-loop "hack all the braces out" approach, but is there =
a better way for something with this simple format?

Regular expressions is the obvious first try.

Rune

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 12:57:02

Message: 4 of 13

Rune Allnor <allnor@tele.ntnu.no> wrote in message <e2d1e726-82c5-4d32-865d-ab0700e0f092@r36g2000prf.googlegroups.com>...

> Regular expressions is the obvious first try.
>
> Rune
.
Hi Rune,
imho regular expressions are quite slow, and this could be noticeable for large files. If I had the choice, I'd just replace the braces with spaces.
Regards
Andres

Subject: read ascii file

From: Rune Allnor

Date: 10 Nov, 2008 13:44:03

Message: 5 of 13

On 10 Nov, 13:57, "Andres" <rant...@werb.deNoRs> wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message <e2d1e726-82c5-4d32-865d-ab0700e0f...@r36g2000prf.googlegroups.com>...
> > Regular expressions is the obvious first try.
>
> > Rune
>
> .
> Hi Rune,
> imho regular expressions are quite slow, and this could be noticeable for large files. If I had the choice, I'd just replace the braces with spaces.

The TXT2MAT function you suggested earlier uses
a syntax which is deceptively similar to a regular
expression. I can't find any documentation for the
function, though, so I don't know how it is implemented.

As for text files, it's very time-consuming to mess
with them. The only *real* time saving is to use a
binary format. This was discussed here not too long
ago:

http://groups.google.no/group/comp.soft-sys.matlab/msg/d49639538f61a0dc?hl=no

Rune

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 13:53:01

Message: 6 of 13

on the speed...
.
for a 1000x1000 matrix file counting from 1 to 1e6 ('{{1,2,3,...', ~6.7Mb),
.
tic
B = txt2mat('ds_1000.txt',0,-1,'ReplaceChar',{'{}, '});
n = sqrt(numel(B));
B = reshape(B,n,n).';
toc
.
takes about one second. (The "0,-1" args switch off the file layout detection which is necessary for lines >64kB)

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 14:03:02

Message: 7 of 13

Rune Allnor <allnor@tele.ntnu.no> wrote in message <008a086e-9fd0-45d6-b4ef-b3aef5c7755d@a17g2000prm.googlegroups.com>...
>
> The TXT2MAT function you suggested earlier uses
> a syntax which is deceptively similar to a regular
> expression. I can't find any documentation for the
> function, though, so I don't know how it is implemented.
.
there's quite a lengthy doc
.
> As for text files, it's very time-consuming to mess
> with them. The only *real* time saving is to use a
> binary format. [..]
.
I fully agree. But often enough, you don't have any choice of the format of the data that is given to you.
.
(sorry for the "."-lines - empty lines are not displayed here)

Subject: read ascii file

From: Rune Allnor

Date: 10 Nov, 2008 14:19:13

Message: 8 of 13

On 10 Nov, 15:03, "Andres" <rant...@werb.deNoRs> wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message <008a086e-9fd0-45d6-b4ef-b3aef5c77...@a17g2000prm.googlegroups.com>...
>
> > The TXT2MAT function you suggested earlier uses
> > a syntax which is deceptively similar to a regular
> > expression. I can't find any documentation for the
> > function, though, so I don't know how it is implemented.
>
> .
> there's quite a lengthy doc

Where? I can't find it in my R2006a release, and
I can't find it among the mathworks list of functions.

> I fully agree. But often enough, you don't have any choice of the format of the data that is given to you.

Fair enough. My point is: Don't complain about speed
when you deal with text files. If speed really is a
concern, use a binary format. If text files is what you
have, don't discard regular expressions on account
of speed.

Rune

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 15:08:03

Message: 9 of 13

Rune Allnor <allnor@tele.ntnu.no> wrote in message <4077b49b-c131-4bdf-8401-2ba9e5698b39@a17g2000prm.googlegroups.com>...
[..]
> Where? I can't find it in my R2006a release, and
> I can't find it among the mathworks list of functions.
>
As I noted, it can be found on the file exchange. I'm the author.
.
> [..] If text files is what you
> have, don't discard regular expressions on account
> of speed.
.
I don't want to discard them in general, I just thought they are not necessary here. To my experience, the replacement process is slowed down by a factor of ~5 with regular expressions, which might be important to the OP who "would like it to be fast", e.g. if he has many files to import - regardless of how much faster a binary import would be.
Regards
Andres

Subject: read ascii file

From: DS

Date: 10 Nov, 2008 16:23:02

Message: 10 of 13

Rune and Andres -- Thank you both for the helpful input.

I got Andres' file exchange code TXT2MAT working as per the following:

B = txt2mat(file,'ReadMode','block','NumColumns',1248,'ReplaceChar',{'{}, '});
n = sqrt(numel(B));
B = reshape(B,n,n).';

I'd rather not have to throw in the magic number there (1248), but apparently the long single line gets read in as ~25 lines and the data gets all twisted when I let TXT2MAT try to figure it out. At any rate, it's faster and cleaner than my hack and slash approach:

%-----------------------------------
%read entire file as cell string
a = textread('file.txt','%s','delimiter',',');
%search for first '}' character (indicates the end of a column)
for count=1:length(a)
    if ~isempty(strfind(cell2mat(a(count)),'}'))
        break
    end
end
ncols = count;
nrows = length(a)/ncols;
%clean braces '{' and '}' from data
a = strrep(a,'{','');
a = strrep(a,'}','');
%convert cell array to char
a = char(a);
%convert string array to numeric
a = str2num(a);
%reshape matrix
a = reshape(a,ncols,nrows);
%-----------------------------------

Subject: read ascii file

From: Andres

Date: 10 Nov, 2008 20:05:06

Message: 11 of 13

If it works - it's fine.
.
But I'm a bit puzzled by the need for the magic number, too, which is not even square. Did you try my latter code which I tested on the one million numbers file?
Just if you like, contact me via the file exchange author page, I'd be curious to look into detail.
Regards
Andres

Subject: read ascii file

From: DS

Date: 10 Nov, 2008 21:22:02

Message: 12 of 13

"Andres" <rantore@werb.deNoRs> wrote in message <gfa45i$kk7$1@fred.mathworks.com>...
 Did you try my latter code which I tested on the one million numbers file?
---
I tried your latter code, and I have the same trouble. I'm sure it would work fine if the data were well formatted; the data is a continuous block of characters with no line-feeds to delimit the rows. I think this is giving TXT2MAT the wrong idea about how the data should be parsed.
.
I'll try to send you a sample file to play with if you're curious.
-DS

Subject: read ascii file

From: Andres

Date: 12 Nov, 2008 12:08:01

Message: 13 of 13

"DS" <null@null.com> wrote in message <gfa8lq$olu$1@fred.mathworks.com>...
> "Andres" <rantore@werb.deNoRs> wrote in message <gfa45i$kk7$1@fred.mathworks.com>...
> I'll try to send you a sample file to play with if you're curious.
> -DS

That would be nice, thanks. I hope you can decipher my e-mail address (leave out any 'r', end with .de). Btw. I wonder where the 'Contact Author' button in the file exchange has gone...

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us