Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Using textscan on difficult data

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 14 Sep, 2010 22:03:04

Message: 1 of 44

Hello,

I am stuck at reading my .txt-file the right way. I don't know how to get textscan to properly skip some colums...Well, that doesn't make much sense but here is an example of the data I want to read:

A B C D E
1 3 4 6
2 5 10
4 8 10 12 13

I want to read all colums, including the empty spots and have it read as NaN... and have like 5 colums resulting:
[1;2;4] [3;5;8] [4;NaN;10] etc
but when I use like:

data=textscan(fid,%f%f%f%f%f,'headerLines',1);

it will only threat the entries that actually have a number in it. and show it in the wrong order:

[1;2;4] [3;5;8] [4;10;10] [6;NaN;12] [NaN;NaN;13]

I hope you understand the problem! Could somebody help me around? :)

Another question, of lesser importance but if you are at it anyway you might answer it for:
How can I textscan like the first colum and have it automatically add leading zeros to it? Like it will result in a number (or string) saying [0001;0002;0004]?

Well thanks in advance,

M.

Subject: Using textscan on difficult data

From: Iain Robinson

Date: 14 Sep, 2010 22:24:04

Message: 2 of 44

Dear M.,
for your first question I suppose you could read the file one line at a time, then rearrange your data once you've read the entire file.

For your second question, sprintf might do the job. Typing

  sprintf('%.4d', '5')

produces the result

  ans = 0005

Yours,
Iain.

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 14 Sep, 2010 22:31:05

Message: 3 of 44

"Iain Robinson" <iain@physics.org> wrote in message <i6osm4$c64$1@fred.mathworks.com>...
> Dear M.,
> for your first question I suppose you could read the file one line at a time, then rearrange your data once you've read the entire file.
>
> For your second question, sprintf might do the job. Typing
>
> sprintf('%.4d', '5')
>
> produces the result
>
> ans = 0005
>
> Yours,
> Iain.
Well thanks,

But I want to do this on very big files so that probably won't work as efficiently as I would like. Or I do not really understand what you are saying :)

And thanks for the answer to the second question. It worked :)

Subject: Using textscan on difficult data

From: TideMan

Date: 14 Sep, 2010 22:41:37

Message: 4 of 44

On Sep 15, 10:31 am, "JMvanwessem " <jmvanwes...@gmail.com> wrote:
> "Iain Robinson" <i...@physics.org> wrote in message <i6osm4$c6...@fred.mathworks.com>...
> > Dear M.,
> > for your first question I suppose you could read the file one line at a time, then rearrange your data once you've read the entire file.
>
> > For your second question, sprintf might do the job. Typing
>
> >   sprintf('%.4d', '5')
>
> > produces the result
>
> >   ans = 0005
>
> > Yours,
> > Iain.
>
> Well thanks,
>
> But I want to do this on very big files so that probably won't work as efficiently as I would like.  Or I do not really understand what you are saying :)
>
> And thanks for the answer to the second question. It worked :)

In the example that you showed, there appear to be an arbitrary number
of spaces between the numbers.
Is this true for the actual files you are dealing with?
For example, for the 2nd row, how do you know which column 10 should
be in?
From my observation, it should be column 6, but there are only 5
columns in the header.
With such inconsistent rules, I cannot see any way that you can solve
the problem.

Where did the file come from?
Maybe you can get whoever wrote it to put some sort of delimiter
between data.

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 14 Sep, 2010 22:58:04

Message: 5 of 44

Hi Tideman,

Well, in the text file they are properly aligned in the colums and I guess the amount of spaces is not 100% arbitrary. But I understand it might cause all sorts of inconvenient problems.
But we are 100% sure which numbers belong to what column, we just want the text file to be converted to a more readable format..
I try to copy a sample of the data below, but it will probably get screwed up on this forum..maybe if you click "show original format" it might show more cleary what it is.

DAY[CYD] TIME[HMS] TIDN Z[FT] SPD[KTS] DIR[DEG] PSPD[KTS] PDIR[DEG] DDEV[DEG] T[F] TD[F] RH PRE[MB]
--------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
  2008232 0 1 6 78.69
  2008232 0 1 12 4.1 148 6.9947 168 18
  2008232 0 1 54 8.9 143 11.9909 137 8 78.69
  2008232 0 3 6 78.80
  2008232 0 3 12 6.0 158 8.9932 157 13
  2008232 0 3 54 9.9 157 10.9916 153 5 78.69
  2008232 0 19 6 83.30 77.70 83

Subject: Using textscan on difficult data

From: dpb

Date: 14 Sep, 2010 23:25:23

Message: 6 of 44

JMvanwessem wrote:
...
> Well, in the text file they are properly aligned in the colums and I
> guess the amount of spaces is not 100% arbitrary. ...

That would imply it is tab-delimited or at least tab-spaced.

If the former, that's good, look at the 'delimiter' optional keyword in
textscan() and friends.

If it's simply a case of using tabs for alignment, you've still got the
problem of needing to parse a fixed width field w/o fixed width (so to
speak :) ).

In that case you really have no choice but to read it as a text line,
expand the tabs then take the substrings according to the field width.

It's another major weakness in C i/o it doesn't have a clue how to
handle a fixed-width field (or at least Matlab's incarnation doesn't; I
really shouldn't comment on the underlying C behavior as I don't know
enough C to say what it says but am assuming TMW implemented similar
behavior) in xscanf and friends...

A perl script to preprocess or even to to the work might be
better/simpler choice than Matlab in this case.

--

Subject: Using textscan on difficult data

From: TideMan

Date: 14 Sep, 2010 23:36:04

Message: 7 of 44

On Sep 15, 11:25 am, dpb <n...@non.net> wrote:
> JMvanwessem wrote:
>
> ...
>
> > Well, in the text file they are properly aligned in the colums and I
> > guess the amount of spaces is not 100% arbitrary. ...
>
> That would imply it is tab-delimited or at least tab-spaced.
>
> If the former, that's good, look at the 'delimiter' optional keyword in
> textscan() and friends.
>
> If it's simply a case of using tabs for alignment, you've still got the
> problem of needing to parse a fixed width field w/o fixed width (so to
> speak :) ).
>
> In that case you really have no choice but to read it as a text line,
> expand the tabs then take the substrings according to the field width.
>
> It's another major weakness in C i/o it doesn't have a clue how to
> handle a fixed-width field (or at least Matlab's incarnation doesn't; I
> really shouldn't comment on the underlying C behavior as I don't know
> enough C to say what it says but am assuming TMW implemented similar
> behavior) in xscanf and friends...
>
> A perl script to preprocess or even to to the work might be
> better/simpler choice than Matlab in this case.
>
> --

Or a one-liner in Fortran, of course:
read(lun,'(12f10.0)')data
(spoken with trepidation in the expectation that a flame war will
ensue)

Subject: Using textscan on difficult data

From: Walter Roberson

Date: 14 Sep, 2010 23:42:24

Message: 8 of 44

On 14/09/10 6:36 PM, TideMan wrote:

> Or a one-liner in Fortran, of course:
> read(lun,'(12f10.0)')data

But.... you can do it with only 8 classes and 19 templates in C++ !

Purge yourself of that obsolete Fortran knowledge, it is only holding
you back from learning REAL programming!

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 14 Sep, 2010 23:53:04

Message: 9 of 44

@dpb

It doesn't seem like it's a fixed tabbing I am afraid...I don't really understand your ways of treating the problem though. I find it difficult to interpret how matlab handles white spaces and tabs... Could you by any chance give me a programming example of how to do it?

Or should I just go try read it with FORTRAN haha? It should be so simple to read these data, as actually it is already ordened quite well, I just want to restructure the data for easier analysis...

Well thanks for the help anyway!

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 02:13:16

Message: 10 of 44

JMvanwessem wrote:
> @dpb
>
> It doesn't seem like it's a fixed tabbing I am afraid...I don't really
> understand your ways of treating the problem though. I find it difficult
> to interpret how matlab handles white spaces and tabs... Could you by
> any chance give me a programming example of how to do it?

Specifically, would need to see the actual content of the file to
determine what is in it. Is there a tab between each field?

If so, using

'delimiter','\t','whitespace',' \b'

in textscan() or textread() has a good chance.

I can't say I can fully interpret Matlab's handling of whitespace,
either--there are instances I've played w/ questions similar to this
before that I never could follow the logic of what actually happens.

As near as I can tell, the best rule is that xscanf() "eats" multiple
whitespace characters irrespective of what is provided regarding field
widths, etc. So, where we're used to an I5, say in Fortran reading a
field that is a fixed 5 characters wide, Matlab given the %5i will, if
the field is ' 1 ' terminate after the first blank after the '1' and
immediately scan for the next field. It keeps throwing away whitespace
until it finds either a delimiter or another character. Which behavior
is, of course, utterly useless for fixed-field data w/ empty fields.
AFAICT, there's no way to change this behavior w/ the i/o routines
natively supplied unless you can find an actual delimiter character in
the record; hence the above question.

Failing that, you're only choice is to read the line as a character
string and parse it w/ the various string functions or regular
expressions. There's another area I don't know enough about to be able
to recommend a solution but it might help, I don't know.

> Or should I just go try read it with FORTRAN haha? It should be so
> simple to read these data, as actually it is already ordened quite well,
> I just want to restructure the data for easier analysis...

That is quite possible altho the tab isn't in the Fortran character set,
either, most (all current?) compilers will accept it in list-directed i/o.

--

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 02:15:32

Message: 11 of 44

TideMan wrote:
...

> Or a one-liner in Fortran, of course:
> read(lun,'(12f10.0)')data
...

As long as the file is actually fixed-width columns. What OP posted
indicates it uses tabs to get the apparent spacing or we wouldn't have
the garbled columns...

--

Subject: Using textscan on difficult data

From: TideMan

Date: 15 Sep, 2010 04:27:40

Message: 12 of 44

On Sep 15, 2:15 pm, dpb <n...@non.net> wrote:
> TideMan wrote:
>
> ...
>
> > Or a one-liner in Fortran, of course:
> > read(lun,'(12f10.0)')data
>
> ...
>
> As long as the file is actually fixed-width columns.  What OP posted
> indicates it uses tabs to get the apparent spacing or we wouldn't have
> the garbled columns...
>
> --

Yes, his original posting did look like that, but his second posting
of the file looked like fixed-width to me when I hit Show Original.

Seems like an awful failing of C and Matlab that it's not easy to read
fixed-width data.
After all, in the old days (when I was a student) the medium for data
was Hollerith cards and they were always fixed-width.

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 04:50:37

Message: 13 of 44

TideMan wrote:
...

> Yes, his original posting did look like that, but his second posting
> of the file looked like fixed-width to me when I hit Show Original.

I don't know what "show original" is or does, but sounds like it
interprets tabs...

> Seems like an awful failing of C and Matlab that it's not easy to read
> fixed-width data.

Near's as I can tell, yes... :( I've never found any simple way to deal
w/ problems as the OP's w/ Matlab; I don't do enough C to know whether
there's really any difference or not, I just gather probably not given
that the ML functions are patterned after C.

> After all, in the old days (when I was a student) the medium for data
> was Hollerith cards and they were always fixed-width.

Guess it depended on the application...the primary applications I used
in those days of yore had a customized front end processor (written at
Bettis) that used comma delimiting and implied decimal for most
precision in fewest columns. Thus 333270+0,580307-1 --> 0.33327 and
0.0580307, respectively (two common values from Mark-I reactor fuel
assembly inputs I can still remember after 30+ years they were repeated
so often. The first is the fuel volume fraction and the latter the H
volume-averaged number density for moderator temperature of 580F at 2250
psia).

--

Subject: Using textscan on difficult data

From: Oleg Komarov

Date: 15 Sep, 2010 07:54:04

Message: 14 of 44

> DAY[CYD] TIME[HMS] TIDN Z[FT] SPD[KTS] DIR[DEG] PSPD[KTS] PDIR[DEG] DDEV[DEG] T[F] TD[F] RH PRE[MB]
> --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
> 2008232 0 1 6 78.69
> 2008232 0 1 12 4.1 148 6.9947 168 18
> 2008232 0 1 54 8.9 143 11.9909 137 8 78.69
> 2008232 0 3 6 78.80
> 2008232 0 3 12 6.0 158 8.9932 157 13
> 2008232 0 3 54 9.9 157 10.9916 153 5 78.69
> 2008232 0 19 6 83.30 77.70 83

JMvanwessem,
the example you posted isn't fixed width because I get no spaces in the TD[F] field for the first record...
Oleg

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 15 Sep, 2010 15:30:17

Message: 15 of 44

Haha..Well, guess I have some hard work to do. I think I thought of an alternate way to read this data, but thanks for all the help. Its interesting to see you all discussing about how stupid some programming languages are...
If I get if fixed I will post how I have done so...if...

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 17:05:25

Message: 16 of 44

JMvanwessem wrote:
> Haha..Well, guess I have some hard work to do. I think I thought of an
> alternate way to read this data, but thanks for all the help. Its
> interesting to see you all discussing about how stupid some programming
> languages are... If I get if fixed I will post how I have done so...if...

You never answered the question earlier about what the actual content of
the record is...can't help w/o the poop; crystal ball is (once again) in
the shop for repair, it's fortune telling accuracy isn't up to spec.

--

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 17:20:55

Message: 17 of 44

dpb wrote:
> TideMan wrote:
> ...
...
>> Seems like an awful failing of C and Matlab that it's not easy to read
>> fixed-width data.
>
> Near's as I can tell, yes... :( I've never found any simple way to deal
> w/ problems as the OP's w/ Matlab; I don't do enough C to know whether
> there's really any difference or not, I just gather probably not given
> that the ML functions are patterned after C.
...

And I'm totally amazed some ML (Steve, where are you?) or C l^hwizard
hasn't jumped in here to explain how it "really is so easy, all you has
to do is ..." (or to at a minimum explain how dumb I am to not know that
:) ).

--

Subject: Using textscan on difficult data

From: Walter Roberson

Date: 15 Sep, 2010 17:36:02

Message: 18 of 44

On 15/09/10 12:20 PM, dpb wrote:

> And I'm totally amazed some ML (Steve, where are you?) or C l^hwizard
> hasn't jumped in here to explain how it "really is so easy, all you has
> to do is ..." (or to at a minimum explain how dumb I am to not know that
> :) ).

Not enough information yet to know if the fields are separated by a mix
of tabs and spaces, or tabs only, or spaces only.

If the fields are separated by a mix of tabs and spaces, I would
probably start by running the data through a unix untabify program.

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 17:47:05

Message: 19 of 44

Walter Roberson wrote:
> On 15/09/10 12:20 PM, dpb wrote:
>
>> And I'm totally amazed some ML (Steve, where are you?) or C l^hwizard
>> hasn't jumped in here to explain how it "really is so easy, all you has
>> to do is ..." (or to at a minimum explain how dumb I am to not know that
>> :) ).
>
> Not enough information yet to know if the fields are separated by a mix
> of tabs and spaces, or tabs only, or spaces only.

Agreed...I poked a stick in the cage of OP just a few minutes ago on
that point... :)

> If the fields are separated by a mix of tabs and spaces, I would
> probably start by running the data through a unix untabify program.

But if it really is fixed width fields w/ empty fields??? :)

--

Subject: Using textscan on difficult data

From: Doug Schwarz

Date: 15 Sep, 2010 17:59:05

Message: 20 of 44

On 9/15/2010 1:47 PM, dpb wrote:
> Walter Roberson wrote:
>> On 15/09/10 12:20 PM, dpb wrote:
>>
>>> And I'm totally amazed some ML (Steve, where are you?) or C l^hwizard
>>> hasn't jumped in here to explain how it "really is so easy, all you has
>>> to do is ..." (or to at a minimum explain how dumb I am to not know that
>>> :) ).
>>
>> Not enough information yet to know if the fields are separated by a
>> mix of tabs and spaces, or tabs only, or spaces only.
>
> Agreed...I poked a stick in the cage of OP just a few minutes ago on
> that point... :)
>
>> If the fields are separated by a mix of tabs and spaces, I would
>> probably start by running the data through a unix untabify program.
>
> But if it really is fixed width fields w/ empty fields??? :)

If the fields are all the same width (say, 2 characters) then it can be
read with

   c = textscan(fid,'%2n%2n%2n','whitespace','','treatasempty',' ')

where the number of spaces in the treatasempty string is also 2.

I have never figured out a way to do it with textscan when the fields
are different widths (though I have tried). You can do it by reading in
characters (e.g., '%2c%3c'), but then you have to read the columns of
characters in a separate step to get numbers.

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 15 Sep, 2010 18:23:05

Message: 21 of 44

dpb <none@non.net> wrote in message <i6qujk$snm$1@news.eternal-september.org>...
> JMvanwessem wrote:
> > Haha..Well, guess I have some hard work to do. I think I thought of an
> > alternate way to read this data, but thanks for all the help. Its
> > interesting to see you all discussing about how stupid some programming
> > languages are... If I get if fixed I will post how I have done so...if...
>
> You never answered the question earlier about what the actual content of
> the record is...can't help w/o the poop; crystal ball is (once again) in
> the shop for repair, it's fortune telling accuracy isn't up to spec.
>
> --

I don't understand what information you would need? I don't know how the data is produced so I don't know if it is tabbed, or spaced or whatever. As far as I know it is spaced as tabbing it isnt able to reproduce the column fixing.

But I am actually managing reading the data how I want it to be read now, but am starting to get really interested in how it could have been done with just textread :)
I wanted to attach just a sample of the datafile but then discovered I can't attach files here?

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 18:20:34

Message: 22 of 44

Doug Schwarz wrote:
...

> If the fields are all the same width (say, 2 characters) then it can be
> read with
>
> c = textscan(fid,'%2n%2n%2n','whitespace','','treatasempty',' ')
>
> where the number of spaces in the treatasempty string is also 2.

That's a distinct improvement/added feature. R12 that I'm stuck at
doesn't have textscan() nor does textread() have the 'treatasempty'
named option. I'll have to try to remember its existence for future
recommendations.

> I have never figured out a way to do it with textscan when the fields
> are different widths (though I have tried). You can do it by reading in
> characters (e.g., '%2c%3c'), but then you have to read the columns of
> characters in a separate step to get numbers.

Yeah, there's the weakness of the way the field width in the format
descriptor is interpreted (or, more precisely, ignored).

Do you know if Matlab is consistent w/ C Standard requiring that
behavior (and if that is required behavior, any logic at all behind it
one can think of other than that's how it was initially implemented and
thus became de facto standard later formalized)?

It certainly baffled me for quite a long time until I finally kinda' did
figure out what it does about simply throwing away whitespace
irrespective of field widths on input...

--

Subject: Using textscan on difficult data

From: dpb

Date: 15 Sep, 2010 18:47:09

Message: 23 of 44

JMvanwessem wrote:
...

> I don't understand what information you would need? I don't know how the
> data is produced so I don't know if it is tabbed, or spaced or whatever.
> As far as I know it is spaced as tabbing it isnt able to reproduce the
> column fixing.

Need to look at a record or two w/ hex editor or other display tool that
shows the content of the file internally w/o the interference of a
formatting tool that interprets tabs as so many spaces and thus displays
the columns aligned.

One way to solve the problem (albeit a little painful for lots of files
unless the particular editor has the ability to be controlled
automatically) is to check the option to "convert tabs to spaces" on
saving a file from an editor.

> But I am actually managing reading the data how I want it to be read
> now, but am starting to get really interested in how it could have been
> done with just textread :)

How are you doing it? That might be sufficient clues to tell...then
again might be a good test for the repairs on the crystal ball, too... :)

> I wanted to attach just a sample of the datafile but then discovered I
> can't attach files here?

Despite the appearance of the TMW web interface portal, c.s-s.m is just
a usenet group; hence it's text only.

If you've got a relatively small sample file, i'm dp bozarth atthe
domain swko dot net. No spaces in the name, obviously along w/ the
obvious substitutions. I'd take a look and see what's in it...

--

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 15 Sep, 2010 19:58:20

Message: 24 of 44

dpb <none@non.net> wrote in message <i6r4ii$pf8$1@news.eternal-september.org>...
> JMvanwessem wrote:
> ...
>
> > I don't understand what information you would need? I don't know how the
> > data is produced so I don't know if it is tabbed, or spaced or whatever.
> > As far as I know it is spaced as tabbing it isnt able to reproduce the
> > column fixing.
>
> Need to look at a record or two w/ hex editor or other display tool that
> shows the content of the file internally w/o the interference of a
> formatting tool that interprets tabs as so many spaces and thus displays
> the columns aligned.
>
> One way to solve the problem (albeit a little painful for lots of files
> unless the particular editor has the ability to be controlled
> automatically) is to check the option to "convert tabs to spaces" on
> saving a file from an editor.
>
> > But I am actually managing reading the data how I want it to be read
> > now, but am starting to get really interested in how it could have been
> > done with just textread :)
>
> How are you doing it? That might be sufficient clues to tell...then
> again might be a good test for the repairs on the crystal ball, too... :)
>
> > I wanted to attach just a sample of the datafile but then discovered I
> > can't attach files here?
>
> Despite the appearance of the TMW web interface portal, c.s-s.m is just
> a usenet group; hence it's text only.
>
> If you've got a relatively small sample file, i'm dp bozarth atthe
> domain swko dot net. No spaces in the name, obviously along w/ the
> obvious substitutions. I'd take a look and see what's in it...
>
> --

Well atm I am just reading all data that is available. So textscan just spews out colums with all the variables like I denoted in my previous posts. As these data are actually tower measurements at certain heights and not all towers measure the same variables this is actually what causes the empty spots to show up.
As I want to have timeseries for each seperate tower at each seperate height I want to change the data.

What I am doing now is just filtering out the timeseries for each tower + height seperately by just doing a id=find(tidn== T# & z == Z#) and applying it on the data vectors given by textscan...it works...but it is not that convenient as I have to manually look at which tower has which measurements..

Well, i hope that's somewhat clear...hah

Subject: Using textscan on difficult data

From: dpb

Date: 16 Sep, 2010 12:58:08

Message: 25 of 44

Doug Schwarz wrote:
> On 9/15/2010 1:47 PM, dpb wrote:
...
>> But if it really is fixed width fields w/ empty fields??? :)
>
> If the fields are all the same width (say, 2 characters) then it can be
> read with
>
> c = textscan(fid,'%2n%2n%2n','whitespace','','treatasempty',' ')
>
> where the number of spaces in the treatasempty string is also 2.
...

OP did e-mail me a small sample file--it is fixed-width; I suggested the
following adaptation of the above solution altho w R12 I can't test it.
> If I counted correctly there are 13 columns and they're each 9
> chars/column w/ a space between columns or could be considered 10
> chars/column w/ the exception of first column which has no missing
> values. Hence for the purpose of the 'treatasempty' parameter, using a
> count of 10 blanks should suffice.
>
> Try playing around w/ something like
> fmt = ['%9f ' repmat('%10f',1,12')];
> c = textscan(fid,'%2n%2n%2n','whitespace','','treatasempty',blanks(1));
>
> You'll need to add the headerlines option and open the file and such
> details, obviously.

No further response to date...

--

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 16 Sep, 2010 19:18:11

Message: 26 of 44

dpb <none@non.net> wrote in message <i6t4g3$uhq$1@news.eternal-september.org>...
> Doug Schwarz wrote:
> > On 9/15/2010 1:47 PM, dpb wrote:
> ...
> >> But if it really is fixed width fields w/ empty fields??? :)
> >
> > If the fields are all the same width (say, 2 characters) then it can be
> > read with
> >
> > c = textscan(fid,'%2n%2n%2n','whitespace','','treatasempty',' ')
> >
> > where the number of spaces in the treatasempty string is also 2.
> ...
>
> OP did e-mail me a small sample file--it is fixed-width; I suggested the
> following adaptation of the above solution altho w R12 I can't test it.
> > If I counted correctly there are 13 columns and they're each 9
> > chars/column w/ a space between columns or could be considered 10
> > chars/column w/ the exception of first column which has no missing
> > values. Hence for the purpose of the 'treatasempty' parameter, using a
> > count of 10 blanks should suffice.
> >
> > Try playing around w/ something like
> > fmt = ['%9f ' repmat('%10f',1,12')];
> > c = textscan(fid,'%2n%2n%2n','whitespace','','treatasempty',blanks(1));
> >
> > You'll need to add the headerlines option and open the file and such
> > details, obviously.
>
> No further response to date...
>
> --

I am giving it another look again but it doesnt seem to work (yet)..I am trying lots of different things now..as now I have also imported the .txt in excel and it actually reads the data well with "merged space delimiter" file import...Now I gotta see how I read it with xlsread...(this would be another way of reading this data)...

But I try to get the "straightforward" textscan to work, I am really curious :)

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 16 Sep, 2010 20:15:24

Message: 27 of 44

I really do no get it to work with the code you advised me to use...atm this is my code:

clear all
filename='081908.txt';
dataPath='/h/mrhome1/jmvanwes/Desktop/COAMPScomp/Coastal_Wind/...
COAMPS_read_Patrick/Patrick_NewData/';
fid=fopen([dataPath filename],'rt');
header1=textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s%s',1);

vari=textscan(fid,' %q',...
        1,'delimiter', ',', ...
         'treatAsEmpty', {'"'});
     
     fmt = ['%9f ' repmat('%10f',1,12')];
data = textscan(fid, fmt, 'whitespace','', 'treatasempty',blanks(10));

It only gives me like all empty colums except the first one, where it only give the first entry. I have tried with blanks(9) as I figured it might be a space less than assumed, and it gave me more data than previously but still not what I wanted as it just skips a bunch of columns...

Help! :)

Subject: Using textscan on difficult data

From: per isakson

Date: 16 Sep, 2010 20:40:09

Message: 28 of 44

"JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6ouls$j12$1@fred.mathworks.com>...
> Hi Tideman,
>
> Well, in the text file they are properly aligned in the colums and I guess the amount of spaces is not 100% arbitrary. But I understand it might cause all sorts of inconvenient problems.
> But we are 100% sure which numbers belong to what column, we just want the text file to be converted to a more readable format..
> I try to copy a sample of the data below, but it will probably get screwed up on this forum..maybe if you click "show original format" it might show more cleary what it is.
>
> DAY[CYD] TIME[HMS] TIDN Z[FT] SPD[KTS] DIR[DEG] PSPD[KTS] PDIR[DEG] DDEV[DEG] T[F] TD[F] RH PRE[MB]
> --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
> 2008232 0 1 6 78.69
> 2008232 0 1 12 4.1 148 6.9947 168 18
> 2008232 0 1 54 8.9 143 11.9909 137 8 78.69
> 2008232 0 3 6 78.80
> 2008232 0 3 12 6.0 158 8.9932 157 13
> 2008232 0 3 54 9.9 157 10.9916 153 5 78.69
> 2008232 0 19 6 83.30 77.70 83

Excuse me for jumping in without having read the whole thread in detail.

One way (the only I know of) to read this file (fixed widths and no delimiter) is something like

1. cac=textscan( fid, '%s', 'delimiter', sprintf( '%s'. '\n'), 'whitespace', '' )

2. char(cac{:}) converts to a character array, e.g. buf <number of lines TIMES length of longest line>

3. D = textscan( transpose( buf(:,1:2) ), %2u )

etc.

I used to add a delimiter to buf(:,1:2), cat( 2, buf(:,1:2), column of ',' ), but that might not be necessary.

This works fairly well. However, it takes some memory.

Before step 3 I think the blanks must be replaced. "'TreatAsEmpty', ' '," doesn't seem to work.
>> c = textscan( ['12';'zz';'34']', '%2f', 'EmptyValue', nan, 'TreatAsEmpty', 'zz' )
c =
    [3x1 double]
>> c{:}
ans =
    12
   NaN
    34

/ per

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 16 Sep, 2010 23:29:04

Message: 29 of 44

"per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message <i6tvb9$av9$1@fred.mathworks.com>...
> "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6ouls$j12$1@fred.mathworks.com>...
> > Hi Tideman,
> >
> > Well, in the text file they are properly aligned in the colums and I guess the amount of spaces is not 100% arbitrary. But I understand it might cause all sorts of inconvenient problems.
> > But we are 100% sure which numbers belong to what column, we just want the text file to be converted to a more readable format..
> > I try to copy a sample of the data below, but it will probably get screwed up on this forum..maybe if you click "show original format" it might show more cleary what it is.
> >
> > DAY[CYD] TIME[HMS] TIDN Z[FT] SPD[KTS] DIR[DEG] PSPD[KTS] PDIR[DEG] DDEV[DEG] T[F] TD[F] RH PRE[MB]
> > --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
> > 2008232 0 1 6 78.69
> > 2008232 0 1 12 4.1 148 6.9947 168 18
> > 2008232 0 1 54 8.9 143 11.9909 137 8 78.69
> > 2008232 0 3 6 78.80
> > 2008232 0 3 12 6.0 158 8.9932 157 13
> > 2008232 0 3 54 9.9 157 10.9916 153 5 78.69
> > 2008232 0 19 6 83.30 77.70 83
>
> Excuse me for jumping in without having read the whole thread in detail.
>
> One way (the only I know of) to read this file (fixed widths and no delimiter) is something like
>
> 1. cac=textscan( fid, '%s', 'delimiter', sprintf( '%s'. '\n'), 'whitespace', '' )
>
> 2. char(cac{:}) converts to a character array, e.g. buf <number of lines TIMES length of longest line>
>
> 3. D = textscan( transpose( buf(:,1:2) ), %2u )
>
> etc.
>
> I used to add a delimiter to buf(:,1:2), cat( 2, buf(:,1:2), column of ',' ), but that might not be necessary.
>
> This works fairly well. However, it takes some memory.
>
> Before step 3 I think the blanks must be replaced. "'TreatAsEmpty', ' '," doesn't seem to work.
> >> c = textscan( ['12';'zz';'34']', '%2f', 'EmptyValue', nan, 'TreatAsEmpty', 'zz' )
> c =
> [3x1 double]
> >> c{:}
> ans =
> 12
> NaN
> 34
>
> / per
Yay, another way of fixing the problem I cannot seem to get to work...

I got like a giant 'char' by doing this with the different columns. I want to textscan the data from it now and I manage that just fine, but it still doesn't treat the empty column entries as empty (NaN) but just ignores them. care to elaborate?

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 16 Sep, 2010 23:40:08

Message: 30 of 44

"JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6u980$e6s$1@fred.mathworks.com>...
> "per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message <i6tvb9$av9$1@fred.mathworks.com>...
> > "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6ouls$j12$1@fred.mathworks.com>...
> > > Hi Tideman,
> > >
> > > Well, in the text file they are properly aligned in the colums and I guess the amount of spaces is not 100% arbitrary. But I understand it might cause all sorts of inconvenient problems.
> > > But we are 100% sure which numbers belong to what column, we just want the text file to be converted to a more readable format..
> > > I try to copy a sample of the data below, but it will probably get screwed up on this forum..maybe if you click "show original format" it might show more cleary what it is.
> > >
> > > DAY[CYD] TIME[HMS] TIDN Z[FT] SPD[KTS] DIR[DEG] PSPD[KTS] PDIR[DEG] DDEV[DEG] T[F] TD[F] RH PRE[MB]
> > > --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
> > > 2008232 0 1 6 78.69
> > > 2008232 0 1 12 4.1 148 6.9947 168 18
> > > 2008232 0 1 54 8.9 143 11.9909 137 8 78.69
> > > 2008232 0 3 6 78.80
> > > 2008232 0 3 12 6.0 158 8.9932 157 13
> > > 2008232 0 3 54 9.9 157 10.9916 153 5 78.69
> > > 2008232 0 19 6 83.30 77.70 83
> >
> > Excuse me for jumping in without having read the whole thread in detail.
> >
> > One way (the only I know of) to read this file (fixed widths and no delimiter) is something like
> >
> > 1. cac=textscan( fid, '%s', 'delimiter', sprintf( '%s'. '\n'), 'whitespace', '' )
> >
> > 2. char(cac{:}) converts to a character array, e.g. buf <number of lines TIMES length of longest line>
> >
> > 3. D = textscan( transpose( buf(:,1:2) ), %2u )
> >
> > etc.
> >
> > I used to add a delimiter to buf(:,1:2), cat( 2, buf(:,1:2), column of ',' ), but that might not be necessary.
> >
> > This works fairly well. However, it takes some memory.
> >
> > Before step 3 I think the blanks must be replaced. "'TreatAsEmpty', ' '," doesn't seem to work.
> > >> c = textscan( ['12';'zz';'34']', '%2f', 'EmptyValue', nan, 'TreatAsEmpty', 'zz' )
> > c =
> > [3x1 double]
> > >> c{:}
> > ans =
> > 12
> > NaN
> > 34
> >
> > / per
> Yay, another way of fixing the problem I cannot seem to get to work...
>
> I got like a giant 'char' by doing this with the different columns. I want to textscan the data from it now and I manage that just fine, but it still doesn't treat the empty column entries as empty (NaN) but just ignores them. care to elaborate?
To complete my previous question:

I now have a column with like empty data and values in there...I want to express it like:

[2;NaN;3;4;5;NaN] and not as [2;3;4;5]! That the whole problem this entire textfile is causing all the time...I cannot get around it, it annoys me...and now I have too many ways of fixing it, which all don't work..except my own workaround, but that is not perfect I am afraid :(

Subject: Using textscan on difficult data

From: per isakson

Date: 17 Sep, 2010 11:40:22

Message: 31 of 44

"JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6u9so$of8$1@fred.mathworks.com>...
> "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6u980$e6s$1@fred.mathworks.com>...
> > "per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message <i6tvb9$av9$1@fred.mathworks.com>...
> > > "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6ouls$j12$1@fred.mathworks.com>...
> > > > Hi Tideman,
......
> > >
> > > / per
> > Yay, another way of fixing the problem I cannot seem to get to work...
> >
> > I got like a giant 'char' by doing this with the different columns. I want to textscan the data from it now and I manage that just fine, but it still doesn't treat the empty column entries as empty (NaN) but just ignores them. care to elaborate?
> To complete my previous question:
>
> I now have a column with like empty data and values in there...I want to express it like:
>
> [2;NaN;3;4;5;NaN] and not as [2;3;4;5]! That the whole problem this entire textfile is causing all the time...I cannot get around it, it annoys me...and now I have too many ways of fixing it, which all don't work..except my own workaround, but that is not perfect I am afraid :(

I learned two things from this thread: a) "TreatAsEmpty" and b) "blanks(1)". "TreatAsEmpty" has been around for at least four years and I try to pick up new features.

The sample text you provided is distorted. It doesn't make sense to me. If you upload a sample file to Sendspace.com (or similar) and post the link and a requirement specifications here I'll make a perfect ;-) function to read it.

/ per
   

Subject: Using textscan on difficult data

From: dpb

Date: 17 Sep, 2010 13:59:15

Message: 32 of 44

per isakson wrote:
...

> The sample text you provided is distorted. It doesn't make sense to me.
...

That was a sidebar discussion -- jm sent me a sample file.

It is fixed-width columns with missing data randomly scattered around.

The fields are 9 columns with a space between the first and subsequent
columns, no space preceding the first (date) column.

So, in Fortran it is (ignoring that some are integer, some floating point)

I9,12(1X,I9)

--

Subject: Using textscan on difficult data

From: JMvanwessem

Date: 17 Sep, 2010 15:38:19

Message: 33 of 44

"per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message <i6vk36$bve$1@fred.mathworks.com>...
> "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6u9so$of8$1@fred.mathworks.com>...
> > "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6u980$e6s$1@fred.mathworks.com>...
> > > "per isakson" <poi.nospam@bimDOTkthDOT.se> wrote in message <i6tvb9$av9$1@fred.mathworks.com>...
> > > > "JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i6ouls$j12$1@fred.mathworks.com>...
> > > > > Hi Tideman,
> ......
> > > >
> > > > / per
> > > Yay, another way of fixing the problem I cannot seem to get to work...
> > >
> > > I got like a giant 'char' by doing this with the different columns. I want to textscan the data from it now and I manage that just fine, but it still doesn't treat the empty column entries as empty (NaN) but just ignores them. care to elaborate?
> > To complete my previous question:
> >
> > I now have a column with like empty data and values in there...I want to express it like:
> >
> > [2;NaN;3;4;5;NaN] and not as [2;3;4;5]! That the whole problem this entire textfile is causing all the time...I cannot get around it, it annoys me...and now I have too many ways of fixing it, which all don't work..except my own workaround, but that is not perfect I am afraid :(
>
> I learned two things from this thread: a) "TreatAsEmpty" and b) "blanks(1)". "TreatAsEmpty" has been around for at least four years and I try to pick up new features.
>
> The sample text you provided is distorted. It doesn't make sense to me. If you upload a sample file to Sendspace.com (or similar) and post the link and a requirement specifications here I'll make a perfect ;-) function to read it.
>
> / per
>
Well I got your method to work now, it feels like a clear way to do it..but I think I made the script loads larger than it should have been. Couldn't properly get it into vectors so I worked around it with cellstr etc...
I now got something like this, and it works, so I am happy, but it was inconvient reading the empty spaces as NaN or zero...

cac=textscan( fid, '%s', 'delimiter', sprintf( '%s','\n'), 'whitespace', '','HeaderLines',1);
f=char(cac{:}) ;
VAR={'SPD' 'DIR' 'PSPD' 'PDIR' 'DDEV' 'TEMP' 'TEMPD' 'rh' 'PRE'};

for ii=1:length(VAR)
    iii=31+ii*10;
    
    ff=f(:,iii:iii+8);
    ix=cellfun('isempty',cellstr(ff));
    ff(ix)='0';
    ff=str2num(ff);
%and from now one we'll translate all outputs into our respective variables
end
 
As you see I am now actually reading the string for every 10 / 9 spaces and outputting that. It outputs the number including the spaces which is basically what I want, but I feel this is a rather bad way of doing this haha...As said the problem with this method for me came in getting the empty column entries as zero/NaN..I couldn't find something else than cellfun to get the 'isempty' spots...
But it works! So Thanks..

...but now I still like to know how to get the other methods to work though, but I should be marching on with my work and not waste time just reading it :)

Subject: Using textscan on difficult data

From: per isakson

Date: 17 Sep, 2010 16:08:05

Message: 34 of 44

"JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i7021b$ec$1@fred.mathworks.com>...
....
> ...but now I still like to know how to get the other methods to work though, but I should be marching on with my work and not waste time just reading it :)


I have done some experiments with textscan to improve my understanding of how that function works. My result is far from the FORTRAN one-liner. However, my solution works and the NaNs ends up in the correct positions.

readfixedformat.txt. (Below I have replaced blank by "-" to avoid automatic reformatting. The file contains blank.)

 DAY[CYD] TIME[HMS] TIDN Z[FT] SPD[KTS] DIR[DEG] PSPD[KTS]
--------- --------- --------- --------- --------- ---------
--2008232---1-----3--4.4---5.5
--2008232-0----2-----4.4---5.5
--2008232-0-1-----3--------5.5
--2008232-0-1--2-----4.4------
--2008232-0-1--2--3--------5.5
--2008232-0-1--2--3--4.4------
--2008232-0-1--2--3--4.4---5.5

I have concluded that this file cannot be read with the formatstring '%*2s%7s%2f%2f%3f%3f%5f%6f' or similar. The reason is that I cannot make 'TreatAsEmpty' recognize blanks.

The best I've come up with is a procedure in several steps.

    frmt2 = '%*2s%7s%2s%2s%3s%3s%5s%6s';
    fid = fopen( 'readfixedformat.txt' );
    cac = textscan( fid, frmt2, 'Delimiter', '', 'Whitespace', '', 'HeaderLines', 2 );
    fclose( fid );
    
cac, <1x7 cell>, is a cell array of cell arrays of strings. Next, for each coulumn do
 
Replace blank rows with blanks and a trailing "z" (something printable)
    buf = strrep( cac{ii}, blanks(wid(ii)), [blanks(wid(ii)-1),'z'] );

Convert to a string array and transpose - textscan reads columnwise.
    buf = transpose( char( buf ) );

Read the string array
     buf = textscan( buf, '%f', 'TreatAsEmpty', 'z', 'EmptyValue', nan );

and peal of the braces
    num = buf{:};

The first column requires special treatment
    buf = textscan( transpose(char(cac{1})), '%4f%3f');
    etc.;

Yes, it too tricky!

/ per

Subject: Using textscan on difficult data

From: dpb

Date: 17 Sep, 2010 18:58:01

Message: 35 of 44

per isakson wrote:
...
> I have done some experiments with textscan to improve my understanding
> of how that function works. My result is far from the FORTRAN one-liner.

...[sample fixed-width file w/ blank fields elided for brevity]...

> I have concluded that this file cannot be read with the formatstring
> '%*2s%7s%2f%2f%3f%3f%5f%6f' or similar. The reason is that I cannot make
> 'TreatAsEmpty' recognize blanks.

...[workaround code same thing]...

> Yes, it too tricky!
...

Indeed it is -- as noted in another subthread of the conversation, it
appears to me TMW needs to put some serious consideration in on how the
scanning functions work in such common cases as this -- it is just too
much of a pita for a 'rapid development' system to deal with.

Support for the conclusion and the need for something better is easily
evidenced by the number of questions here a c.s-s.m on the subject and
even the regulars don't have ready solutions for many common difficulties.

I'd suggest one alternative is one I've thought of doing a mex file
submission on but I don't have the time (or probably more accurately,
the inclination any longer since retired from consulting and back
farming) for writing anything at all approaching "bullet-proof" code
worthy of submission.

That would be a routine similar in net function to textscan() and
friends with "vectorized" Fortran formatting specifications instead of
C-like ones as used now. I think it would be _much_ easier to use in
general and definitely win hands-down in such special cases as this.

--

Subject: Using textscan on difficult data

From: Oleg Komarov

Date: 17 Sep, 2010 19:11:09

Message: 36 of 44

dpb <none@non.net> wrote in message <i70dus$1v6$1@news.eternal-september.org>...
> per isakson wrote:
> ...
> > I have done some experiments with textscan to improve my understanding
> > of how that function works. My result is far from the FORTRAN one-liner.
>
> ...[sample fixed-width file w/ blank fields elided for brevity]...
>
> > I have concluded that this file cannot be read with the formatstring
> > '%*2s%7s%2f%2f%3f%3f%5f%6f' or similar. The reason is that I cannot make
> > 'TreatAsEmpty' recognize blanks.
>
> ...[workaround code same thing]...
>
> > Yes, it too tricky!
> ...
>
> Indeed it is -- as noted in another subthread of the conversation, it
> appears to me TMW needs to put some serious consideration in on how the
> scanning functions work in such common cases as this -- it is just too
> much of a pita for a 'rapid development' system to deal with.
>
> Support for the conclusion and the need for something better is easily
> evidenced by the number of questions here a c.s-s.m on the subject and
> even the regulars don't have ready solutions for many common difficulties.
>
> I'd suggest one alternative is one I've thought of doing a mex file
> submission on but I don't have the time (or probably more accurately,
> the inclination any longer since retired from consulting and back
> farming) for writing anything at all approaching "bullet-proof" code
> worthy of submission.
>
> That would be a routine similar in net function to textscan() and
> friends with "vectorized" Fortran formatting specifications instead of
> C-like ones as used now. I think it would be _much_ easier to use in
> general and definitely win hands-down in such special cases as this.
>
> --
For a fixed width file:
TMW suggests to read in each row as a string and separate the fields by column indexing then convert with num2string where needed or deblank and rescan.

Oleg

Subject: Using textscan on difficult data

From: dpb

Date: 17 Sep, 2010 20:19:29

Message: 37 of 44

Oleg Komarov wrote:
...

> For a fixed width file:
> TMW suggests to read in each row as a string and separate the fields by
> column indexing then convert with num2string where needed or deblank and
> rescan.
...

Which is both an unnecessary pita if compared to a sensible formatting
string interpreter which would understand what a field width
specification really means and a terrible performance hit even if do go
that route...

$0.02, etc., ...

--

Subject: Using textscan on difficult data

From: per isakson

Date: 17 Sep, 2010 22:14:04

Message: 38 of 44

dpb <none@non.net> wrote in message <i70inf$vnp$1@news.eternal-september.org>...
> Oleg Komarov wrote:
> ...
>
> > For a fixed width file:
> > TMW suggests to read in each row as a string and separate the fields by
> > column indexing then convert with num2string where needed or deblank and
> > rescan.
> ...
>
> Which is both an unnecessary pita if compared to a sensible formatting
> string interpreter which would understand what a field width
> specification really means and a terrible performance hit even if do go
> that route...
>
> $0.02, etc., ...
>
> --

I still miss the FORTRAN-way of formatting, although it's nearly twenty years since I used it. However, The Mathworks have choosen the C-route and I cannot think they will change that.

TEXTSCAN has so many keywords that one more, FixedWidth (true/false), wouldn't hurt (that much). In that case (FixedWidth, true) 'TreatAsEmpty' could recognized blanks.

/ per

Subject: Using textscan on difficult data

From: dpb

Date: 17 Sep, 2010 22:35:21

Message: 39 of 44

per isakson wrote:
> dpb <none@non.net> wrote in message
> <i70inf$vnp$1@news.eternal-september.org>...
>> Oleg Komarov wrote:
>> ...
>>
>> > For a fixed width file:
>> > TMW suggests to read in each row as a string and separate the fields
>> by > column indexing then convert with num2string where needed or
>> deblank and > rescan.
>> ...
>>
>> Which is both an unnecessary pita if compared to a sensible formatting
>> string interpreter which would understand what a field width
>> specification really means and a terrible performance hit even if do
>> go that route...
>>
>> $0.02, etc., ...
>>
>> --
>
> I still miss the FORTRAN-way of formatting, although it's nearly twenty
> years since I used it. However, The Mathworks have choosen the C-route
> and I cannot think they will change that.
>
> TEXTSCAN has so many keywords that one more, FixedWidth (true/false),
> wouldn't hurt (that much). In that case (FixedWidth, true)
> 'TreatAsEmpty' could recognized blanks.
> / per

I don't/didn't suggest they do that; can't w/o too much breaking of
existing code (after all, I'm the one that just recently chastised TMW
for not always honoring backward compatibility again where it seems to
me it isn't necessary to break it :) ).

I _DO_ strongly suggest they implement a feature to fix the current mess
in handling fixed-width columns on input, whether it's your alternative
or something more similar to my suggestion.

I think in my manner because I can see how to easily implement same in
Fortran; I don't know C well enough to ensure one doesn't run into the
same problem inside the C i/o runtime as well. (Nobody ever disputed my
assumption up thread that the ML behavior emulates that of C; I was
going to write a test case but discovered I don't even have a working C
compiler installed here any longer and wasn't going to the trouble to
reinstall one for the purpose.)

I think the Fortran solution for format specification syntax is far
superior in this regard; the C syntax has some advantages in some
instances such as embedding other characters and variable numbers of
outputs. Overall, given a choice I'd choose the Fortran solution if had
to pick one (and a choice of which to pick, of course... :) ).

--

Subject: Using textscan on difficult data

From: Andres

Date: 17 Sep, 2010 23:23:09

Message: 40 of 44

"JMvanwessem " <jmvanwessem@gmail.com> wrote in message <i7021b$ec$1@fred.mathworks.com>...
>[...]
> I now got something like this, and it works, so I am happy, but it was inconvient reading the empty spaces as NaN or zero...
>
> cac=textscan( fid, '%s', 'delimiter', sprintf( '%s','\n'), 'whitespace', '','HeaderLines',1);
> f=char(cac{:}) ;
> VAR={'SPD' 'DIR' 'PSPD' 'PDIR' 'DDEV' 'TEMP' 'TEMPD' 'rh' 'PRE'};
>
> for ii=1:length(VAR)
> iii=31+ii*10;
>
> ff=f(:,iii:iii+8);
> ix=cellfun('isempty',cellstr(ff));
> ff(ix)='0';
> ff=str2num(ff);
> %and from now one we'll translate all outputs into our respective variables
> end
>
> As you see I am now actually reading the string for every 10 / 9 spaces and outputting that. It outputs the number including the spaces which is basically what I want, but I feel this is a rather bad way of doing this haha...As said the problem with this method for me came in getting the empty column entries as zero/NaN..I couldn't find something else than cellfun to get the 'isempty' spots...
> But it works! So Thanks..
>
> ...but now I still like to know how to get the other methods to work though, but I should be marching on with my work and not waste time just reading it :)

just for fun, another method involving string replacements, wrapped into txt2mat (from the file exchange) for (my) convenience:

fwidth = 10;
options = {'NumColumns',13, ...
           'ReadMode','line',...
           'ReplaceRegExpr',{{repmat(' ',1,fwidth), ...
               [',' repmat(' ',1,fwidth-4),'NaN']}} };
data = txt2mat(filename,options{:});

Subject: Using textscan on difficult data

From: Andres

Date: 17 Sep, 2010 23:28:24

Message: 41 of 44

Sorry, forget about that confusing ','

fwidth = 10;
options = {'NumColumns',13, ...
           'ReadMode','line',...
           'ReplaceRegExpr',{{repmat(' ',1,fwidth), ...
             [repmat(' ',1,fwidth-3),'NaN']}} };

data = txt2mat(filename,options{:});

Subject: Using textscan on difficult data

From: Walter Roberson

Date: 17 Sep, 2010 23:35:42

Message: 42 of 44

On 10-09-17 10:38 AM, JMvanwessem wrote:

> I now got something like this, and it works, so I am happy, but it was
> inconvient reading the empty spaces as NaN or zero...

Somewhere in the discussion, someone pointed out that if you request a
specific reading width such as %9d then if the number ends before the
designated width (with blanks there) then Matlab will stop reading the number
there and will start with the next format specifier, thus throwing the widths
of everything else off.

I tested in Matlab and confirmed that that is a difficulty.

A few minutes ago I checked my ISO C 1989 standard, and found that in C the
action would be the same. In C, a width specifier for fscanf() and like
purposes is considered a maximum width, and if the number is found to end
before that width is reached, the conversion ends. There is left-justification
in output but no corresponding signifier to support left-justification in
input, for example.

Going way back in my memory, I seem to recall that in Fortran 77 if there is a
fixed width format, then blanks inside the field are treated as zeros. I am
not positive of that, and it does lead to questions about treatment of blanks
before a leading sign, but at least it would make for reliable field reading.

Subject: Using textscan on difficult data

From: dpb

Date: 18 Sep, 2010 00:01:28

Message: 43 of 44

Walter Roberson wrote:
...
> Somewhere in the discussion, someone pointed out that if you request a
> specific reading width such as %9d then if the number ends before the
> designated width (with blanks there) then Matlab will stop reading the
> number there and will start with the next format specifier, thus
> throwing the widths of everything else off.
>
> I tested in Matlab and confirmed that that is a difficulty.

That would have been I who made the compl^h^h^h^h^hobservation and the
presumption that that did follow C behavior given the pedigree of
xscanf() and friends...

> A few minutes ago I checked my ISO C 1989 standard, and found that in C
> the action would be the same. In C, a width specifier for fscanf() and
> like purposes is considered a maximum width, and if the number is found
> to end before that width is reached, the conversion ends. There is
> left-justification in output but no corresponding signifier to support
> left-justification in input, for example.

That confirms the supposition above...sorry way to do it, but they're
stuck now, of course, w/ backward compatibility.

> Going way back in my memory, I seem to recall that in Fortran 77 if
> there is a fixed width format, then blanks inside the field are treated
> as zeros. I am not positive of that, and it does lead to questions about
> treatment of blanks before a leading sign, but at least it would make
> for reliable field reading.

That is so and was so even before F77 (treatment of leading and trailing
blanks in a given field width). The rule for input processing is that
leading blanks in the external field are ignored. If BLANK='NULL' is in
effect (or the BN edit descriptor has been specified) embedded and
trailing blanks are ignored; otherwise, they are treated as zeros. An
all-blank field is treated as a value of zero.

Leading blanks, being either ignored or zero-substituted don't affect
the numeric values.

The BN option is effective for the READ statement only; the BLANK=
option is part of the OPEN statement and thus is effective for the file.

The BN option has one particularly useful function; given an interactive
READ such as

READ(*,'(I6)') IntVar

if the input record were '123' the i/o subsystem would pad the record to
six w/ trailing blanks and the returned value would be 123000. On the
other hand, with

READ(*,'(BN,I6)') IntVar

and the same '123' input record the pad characters would be ignored and
the expected value from the user's standpoint would be 123.

The use of list-directed input has pretty much eliminated the need for
such although again w/ a fixed-width input file particularly w/ empty
fields as the subject of the thread, the Fortran treatment is much
easier to deal with than C/Matlab.

--


--



The blanks

Subject: Using textscan on difficult data

From: dpb

Date: 19 Sep, 2010 19:12:59

Message: 44 of 44

dpb wrote:
...

> The BN option has one particularly useful function; given an interactive
> READ such as
>
> READ(*,'(I6)') IntVar
>
> if the input record were '123' the i/o subsystem would pad the record to
> six w/ trailing blanks and the returned value would be 123000. ...

Note this is the same behavior as Fortran CHARACTER data behavior where

character(len=10) :: acharvariable
...

acharvariable = 'Hello'

results in the character string 'Hello ' being stored, w/ the length
of the declared variable being blank padded on the right during the store.

So, the blank padding is not a unique characteristic in FORMATted i/o;
it's characteristic of Fortran fixed-length CHARACTER data.

--

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us