Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
fscanf / sscanf / textscan with fixed format

Subject: fscanf / sscanf / textscan with fixed format

From: Ross W

Date: 11 May, 2012 10:21:07

Message: 1 of 20

Hi

I have a string which I want to be interpreted as 3 integers, each occupying 2 character positions.
The string is ' 2 456', with blanks in positions 1 and 3. The answer I expect is [2 4 56].

>> i3=sscanf(' 2 456','%2d');disp(i3')
     2 45 6

This is not what I want, so I need to change my code. Can you help?
If I replace the blanks by zeros, I get the result I want:
>> i3=sscanf('020456','%2d');disp(i3')
     2 4 56

My goal is to write code which will produce my expected answer, so I can then process 400 MB of text which was provided by a third party.

Compare also
>> i3=textscan(' 2 456','%2d');disp(i3{:}')
           2 45 6

>> i3=textscan('020456','%2d');disp(i3{:}')
           2 4 56

So it seems like I have a misunderstanding of how '%2d' works.
I prefer not to replace all spaces by zeros in the 400 MB file. What are the efficient alternatives please?

Also, I am curious about why fscanf / sscanf / textscan interprets leading spaces differently to leading zeros.

Thanks,
Ross

Subject: fscanf / sscanf / textscan with fixed format

From: Bruno Luong

Date: 11 May, 2012 11:51:13

Message: 2 of 20

str2num(reshape(' 2 456',2,[])')

% Bruno

Subject: fscanf / sscanf / textscan with fixed format

From: Ross W

Date: 11 May, 2012 12:38:45

Message: 3 of 20

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <joiufh$bra$1@newscl01ah.mathworks.com>...
> str2num(reshape(' 2 456',2,[])')
>
> % Bruno

Thanks Bruno - much appreciated. I've adapted this to my bigger task, and it is running well.

Ross

Subject: fscanf / sscanf / textscan with fixed format

From: Steven_Lord

Date: 11 May, 2012 13:29:42

Message: 4 of 20



"Ross W" <rosswoodskiwi@hotmail.com> wrote in message
news:joip6j$kj0$1@newscl01ah.mathworks.com...
> Hi
>
> I have a string which I want to be interpreted as 3 integers, each
> occupying 2 character positions. The string is ' 2 456', with blanks in
> positions 1 and 3. The answer I expect is [2 4 56].
>>> i3=sscanf(' 2 456','%2d');disp(i3')
> 2 45 6
>
> This is not what I want, so I need to change my code. Can you help?
> If I replace the blanks by zeros, I get the result I want:
>>> i3=sscanf('020456','%2d');disp(i3')
> 2 4 56
>
> My goal is to write code which will produce my expected answer, so I can
> then process 400 MB of text which was provided by a third party.
>
> Compare also
>>> i3=textscan(' 2 456','%2d');disp(i3{:}')
> 2 45 6
>
>>> i3=textscan('020456','%2d');disp(i3{:}')
> 2 4 56
>
> So it seems like I have a misunderstanding of how '%2d' works.
> I prefer not to replace all spaces by zeros in the 400 MB file. What are
> the efficient alternatives please?
>
> Also, I am curious about why fscanf / sscanf / textscan interprets leading
> spaces differently to leading zeros.

Space is not a digit. Zero is. In your first call, first SSCANF attempts to
find a block of up to 2 digits to form the first element of the output. It
finds the single digit 2 (spaces aren't digits and so they're not considered
part of the number.) When it comes time for it to find the second element of
the output, it looks for the next block of up to 2 digits and finds '45'
which becomes the number 45. The third element is the last block of up to 2
digits starting at the 6; since the string ends right after the 6, that's
the last element.

In the case where you use zeros, the first element of the output is formed
from the digits '02', the second from '04', and the third from '56'.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 13:45:20

Message: 5 of 20

On 5/11/2012 8:29 AM, Steven_Lord wrote:
...

> Space is not a digit. Zero is. In your first call, first SSCANF attempts
> to find a block of up to 2 digits to form the first element of the
> output. It finds the single digit 2 (spaces aren't digits and so they're
> not considered part of the number.)...

I've always considered this a fundamental flaw in C; it makes (as we see
in a "veritable plethora" of questions on trying to parse fixed-width
formats) what should be simple sometimes very hard.

Has TMW ever considered vectorizing the Fortran-like FORMAT statement
behavior as an alternative to C-like formatting? Either as a switch or
a second set of functions? It would surely be a boon imo...I've always
wondered why chose the C model instead of Fortran to being with given
the roots of Matlab.

--

Subject: fscanf / sscanf / textscan with fixed format

From: Steven_Lord

Date: 11 May, 2012 14:35:43

Message: 6 of 20



"dpb" <none@non.net> wrote in message news:joj55g$8s6$1@speranza.aioe.org...
> On 5/11/2012 8:29 AM, Steven_Lord wrote:
> ...
>
>> Space is not a digit. Zero is. In your first call, first SSCANF attempts
>> to find a block of up to 2 digits to form the first element of the
>> output. It finds the single digit 2 (spaces aren't digits and so they're
>> not considered part of the number.)...
>
> I've always considered this a fundamental flaw in C; it makes (as we see
> in a "veritable plethora" of questions on trying to parse fixed-width
> formats) what should be simple sometimes very hard.
>
> Has TMW ever considered vectorizing the Fortran-like FORMAT statement
> behavior as an alternative to C-like formatting? Either as a switch or a
> second set of functions? It would surely be a boon imo...I've always
> wondered why chose the C model instead of Fortran to being with given the
> roots of Matlab.

I can submit that as an enhancement request. As to why the string processing
functions chose the C model versus the Fortran model ... that was a wee bit
before my time at MathWorks :) but if I had to guess I'd say it's related to
who originally wrote those functions. If you look at the list of functions
in Cleve's original version:

http://www.mathworks.com/company/newsletters/articles/the-origins-of-matlab.html

there look to be maybe five file I/O related functions: DIAR, EDIT, FILE,
LOAD, and SAVE (four of which, assuming you allow the renaming of DIAR to
DIARY, exist to this day.) I'm guessing functions like FSCANF etc. came in
after the reprogramming mentioned in the next-to-last paragraph.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: fscanf / sscanf / textscan with fixed format

From: Ross W

Date: 11 May, 2012 15:01:34

Message: 7 of 20

"Steven_Lord" <slord@mathworks.com> wrote in message <joj486$7tu$1@newscl01ah.mathworks.com>...
>
>
> "Ross W" <rosswoodskiwi@hotmail.com> wrote in message
> news:joip6j$kj0$1@newscl01ah.mathworks.com...
> > Hi
> >
> > I have a string which I want to be interpreted as 3 integers, each
> > occupying 2 character positions. The string is ' 2 456', with blanks in
> > positions 1 and 3. The answer I expect is [2 4 56].
> >>> i3=sscanf(' 2 456','%2d');disp(i3')
> > 2 45 6
> >
> > This is not what I want, so I need to change my code. Can you help?
> > If I replace the blanks by zeros, I get the result I want:
> >>> i3=sscanf('020456','%2d');disp(i3')
> > 2 4 56
> >
> > My goal is to write code which will produce my expected answer, so I can
> > then process 400 MB of text which was provided by a third party.
> >
> > Compare also
> >>> i3=textscan(' 2 456','%2d');disp(i3{:}')
> > 2 45 6
> >
> >>> i3=textscan('020456','%2d');disp(i3{:}')
> > 2 4 56
> >
> > So it seems like I have a misunderstanding of how '%2d' works.
> > I prefer not to replace all spaces by zeros in the 400 MB file. What are
> > the efficient alternatives please?
> >
> > Also, I am curious about why fscanf / sscanf / textscan interprets leading
> > spaces differently to leading zeros.
>
> Space is not a digit. Zero is. In your first call, first SSCANF attempts to
> find a block of up to 2 digits to form the first element of the output. It
> finds the single digit 2 (spaces aren't digits and so they're not considered
> part of the number.) When it comes time for it to find the second element of
> the output, it looks for the next block of up to 2 digits and finds '45'
> which becomes the number 45. The third element is the last block of up to 2
> digits starting at the 6; since the string ends right after the 6, that's
> the last element.
>
> In the case where you use zeros, the first element of the output is formed
> from the digits '02', the second from '04', and the third from '56'.
>
> --
> Steve Lord
> slord@mathworks.com
> To contact Technical Support use the Contact Us link on
> http://www.mathworks.com

Thanks Steve - that's clear and unambiguous.

Do you think a comment in the help for sscanf/fscanf/textscan about strategies for reading fixed-width fields in ASCII files might be of wide interest?

Or perhaps elsewhere in the help? Like here "Using Low-Level File I/O Functions: Reading Formatted ASCII Data"?

Ross

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 18:01:49

Message: 8 of 20

On 5/11/2012 9:35 AM, Steven_Lord wrote:
...

> ... I'm guessing functions like FSCANF
> etc. came in after the reprogramming mentioned in the next-to-last
> paragraph.

Clearly would seem that would be so simply because they do use C idioms
(and I suppose at the heart end up in "standard" C i/o rtl).

At one time I had written a set of utilities that used Fortran mex-files
to deal with it in previously working w/ Matlab but seem at some point
to have lost them; at least I can't find them on this machine.

I hadn't done everything but at least could do stuff like OP's request
here transparently...I had not worried about the newer data structures
of cells and all for the biggest shortcoming I remember.

I've still got the older box downstairs that was functional when I left
TN and basically retired, maybe they're on it--I know I did reinstall
Matlab on this machine rather than do an image copy so possibly there's
stuff there that didn't make it over. I'll see if I can boot it at some
point and look; if so I'd happily provide the outlines at least.

They essentially passed Fortran FORMAT statements thru to a Fortran
mex-file as said and it dealt w/ the request. In a quick think about
it, some of the F95 and later features would make some of what it did
laboriously much easier I believe.

--

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 18:05:22

Message: 9 of 20

On 5/11/2012 10:01 AM, Ross W wrote:
...

> Thanks Steve - that's clear and unambiguous.
...

Ugly and a pita but unambiguous... :)

I had a bright idea but it didn't work, either. Thought just on outside
chance that if changed definition of 'whitespace' in textread() and/or
friends it might manage to cope. But, no joy there, either--it barfs on
the space as nonumeric, too. :(

(Not surprised particularly, just hoping beyond hope...)

As my note to Steve says, it just seems such a boneheaded way to run an
i/o formatting statement. (But, much in C has always seemed that way to
me, being a Fortran kinda' guy :) )

--

Subject: fscanf / sscanf / textscan with fixed format

From: Doug Schwarz

Date: 11 May, 2012 18:17:18

Message: 10 of 20

On 5/11/2012 2:05 PM, dpb wrote:
> On 5/11/2012 10:01 AM, Ross W wrote:
> ...
>
>> Thanks Steve - that's clear and unambiguous.
> ...
>
> Ugly and a pita but unambiguous... :)
>
> I had a bright idea but it didn't work, either. Thought just on outside
> chance that if changed definition of 'whitespace' in textread() and/or
> friends it might manage to cope. But, no joy there, either--it barfs on
> the space as nonumeric, too. :(
>
> (Not surprised particularly, just hoping beyond hope...)
>
> As my note to Steve says, it just seems such a boneheaded way to run an
> i/o formatting statement. (But, much in C has always seemed that way to
> me, being a Fortran kinda' guy :) )


The scanf functions (fscanf & sscanf) were never intended to do
fixed-field formatted reading. Look at the name of the functions: they
*scan* the input looking for characters that satisfy the next format
specification. That has always been the C way.

I agree that MATLAB should have a better way to read fixed-field
formatted files (alliteration not intentional), but that functionality
should not be shoehorned into the scanf functions -- entirely new
function names should be chosen. It would not be difficult to write an
m-file to do it, but something like this needs to be done at a lower
level for speed.


--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 18:35:56

Message: 11 of 20

On 5/11/2012 1:17 PM, Doug Schwarz wrote:
...

> The scanf functions (fscanf & sscanf) were never intended to do
> fixed-field formatted reading. Look at the name of the functions: they
> *scan* the input looking for characters that satisfy the next format
> specification. That has always been the C way.

I don't disagree it's "the C way"; that doesn't mean it was/is a good
way. I think it was a poor design choice for most numerical work;
perhaps ok for the purposes for which C was written which wasn't
particularly for numerics.

> I agree that MATLAB should have a better way to read fixed-field
> formatted files (alliteration not intentional), but that functionality
> should not be shoehorned into the scanf functions -- entirely new
> function names should be chosen. It would not be difficult to write an
> m-file to do it, but something like this needs to be done at a lower
> level for speed.

Yes; it would not a wise solution to overload existing functions and I'd
seriously doubt TMW would even consider so even though I did mention
that as one choice it's not the one I would advocate.

--

Subject: fscanf / sscanf / textscan with fixed format

From: Doug Schwarz

Date: 11 May, 2012 18:56:00

Message: 12 of 20

On 5/11/2012 2:35 PM, dpb wrote:
> On 5/11/2012 1:17 PM, Doug Schwarz wrote:
> ...
>
>> The scanf functions (fscanf & sscanf) were never intended to do
>> fixed-field formatted reading. Look at the name of the functions: they
>> *scan* the input looking for characters that satisfy the next format
>> specification. That has always been the C way.
>
> I don't disagree it's "the C way"; that doesn't mean it was/is a good
> way. I think it was a poor design choice for most numerical work;
> perhaps ok for the purposes for which C was written which wasn't
> particularly for numerics.

I agree that the file reading functions in C were not specifically
intended for numerical work. Nevertheless, having functions that do
what they do is essential and not at all a poor choice. In my
experience, many text files simply contain a few columns of numbers
*not* in fixed-field form. The only poor choice was not providing the
fixed-field capability *in addition to* the scanning capability.

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 19:28:09

Message: 13 of 20

On 5/11/2012 1:56 PM, Doug Schwarz wrote:
...

> ... The only poor choice was not providing the fixed-field
> capability *in addition to* the scanning capability.

I guess that's one way to put it in best light possible... :)

I think we've said same thing from opposite direction...I think having
implemented scanf as was done is a poor choice; you're willing to say
they should have added a feature that didn't.

The "scanning" is nearly readily available w/ the FORMAT implementation
w/ free-format ('*') FORMAT descriptor although it's not as tolerant to
other nonnumeric characters in the scanned string as is/are scanf and
friends.

In general, I've found it easier to deal with those cases in an ad hoc
fashion than the alternate (and for more common to need fixed-width).

--

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 20:08:54

Message: 14 of 20

On 5/11/2012 1:56 PM, Doug Schwarz wrote:
...

> I agree that the file reading functions in C were not specifically
> intended for numerical work. Nevertheless, having functions that do what
> they do is essential and not at all a poor choice. In my experience,
> many text files simply contain a few columns of numbers *not* in
> fixed-field form. The only poor choice was not providing the fixed-field
> capability *in addition to* the scanning capability.

I guess my real beef is w/ TMW choosing the easy way out and simply
emulating the C functions for an intensive numerical application in
Matlab; particularly when there was the Fortran model already available
as a sample.

And compound that to not have fixed it/added the facility in lo! these
many years since despite the veritable plethora of problems users have
obviously had over the years dealing w/ fixed-field-width i/o.

--

Subject: fscanf / sscanf / textscan with fixed format

From: Star Strider

Date: 11 May, 2012 20:39:13

Message: 15 of 20

"Ross W" wrote in message <joip6j$kj0$1@newscl01ah.mathworks.com>...
> Hi
>
> I have a string which I want to be interpreted as 3 integers, each occupying 2 character positions.
> The string is ' 2 456', with blanks in positions 1 and 3. The answer I expect is [2 4 56].
>
> >> i3=sscanf(' 2 456','%2d');disp(i3')
> 2 45 6
>
> This is not what I want, so I need to change my code. Can you help?
> If I replace the blanks by zeros, I get the result I want:
> >> i3=sscanf('020456','%2d');disp(i3')
> 2 4 56

-----------------------------------
Here's another way that may also generally address the fixed field width problem:

i3 = sscanf(' 2 456', '%c');
i3n = str2num([i3(1:2); i3(3:4); i3(5:6)])';

Subject: fscanf / sscanf / textscan with fixed format

From: Doug Schwarz

Date: 11 May, 2012 20:56:49

Message: 16 of 20

On 5/11/2012 4:08 PM, dpb wrote:
> On 5/11/2012 1:56 PM, Doug Schwarz wrote:
> ...
>
>> I agree that the file reading functions in C were not specifically
>> intended for numerical work. Nevertheless, having functions that do what
>> they do is essential and not at all a poor choice. In my experience,
>> many text files simply contain a few columns of numbers *not* in
>> fixed-field form. The only poor choice was not providing the fixed-field
>> capability *in addition to* the scanning capability.
>
> I guess my real beef is w/ TMW choosing the easy way out and simply
> emulating the C functions for an intensive numerical application in
> Matlab; particularly when there was the Fortran model already available
> as a sample.
>
> And compound that to not have fixed it/added the facility in lo! these
> many years since despite the veritable plethora of problems users have
> obviously had over the years dealing w/ fixed-field-width i/o.

I think it very much depends on your particular experience. In my case,
I have been using MATLAB for about 25 years and have not needed
fixed-field file reading even once. Plus, in all the years I have been
reading this newsgroup, the topic has come up once in a while, but far
less frequently than, say, the limitations of floating point arithmetic
(which is perhaps the #1 recurring topic). Clearly it doesn't get
enough complaints for TMW to actually implement something.


--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 11 May, 2012 22:17:59

Message: 17 of 20

On 5/11/2012 3:56 PM, Doug Schwarz wrote:
...

> I think it very much depends on your particular experience. In my case,
> I have been using MATLAB for about 25 years and have not needed
> fixed-field file reading even once. Plus, in all the years I have been
> reading this newsgroup, the topic has come up once in a while, but far
> less frequently than, say, the limitations of floating point arithmetic
> (which is perhaps the #1 recurring topic). Clearly it doesn't get enough
> complaints for TMW to actually implement something.

I don't have an actual count of years but 2012-25 = 1987 which is almost
20 years after I started working in '68. I do remember the first
version of Matlab I had was the Windows 3.1 Release 4(?).

I've certainly seen quite a number of fixed-field cases in that time.

I don't have a count on the number of postings but I know I've answered
quite a number that boil down to the same thing. They're often not
couched in the the terms, however, so a search of subject won't find
them. There were two or three parsing questions within the last couple
of weeks at least one of which could have been done w/ sscanf instead of
w/ substring operations or the other ways that were used.

That there are other questions that are more prevalent doesn't mean it
isn't a problem; only that folks just accept what is because, as you
say, "it's always been that way in C" and everybody (most, anyway)
recognizes the heritage for what it is.

I never posted an enhancement request because I did work around it by
writing a mex-file when I was doing a lot of Matlab work (Matlab came
and went; in the consulting arena I was in, work subjects changed
drastically so wasn't a continual daily usage, just a toolset that came
in more or less handy at any given time.)

--

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 12 May, 2012 12:41:07

Message: 18 of 20

On 5/11/2012 1:56 PM, Doug Schwarz wrote:
...

> I agree that the file reading functions in C were not specifically
> intended for numerical work. Nevertheless, having functions that do what
> they do is essential and not at all a poor choice. In my experience,
> many text files simply contain a few columns of numbers *not* in
> fixed-field form. The only poor choice was not providing the fixed-field
> capability *in addition to* the scanning capability.
>

One last comment/note...

The other really bad result of choosing the format string syntax as was
done in C is that it eliminated the use of a repeat specifier--thus
leading to the abominations of things like '%d%d%d%d%d%d%f%f%' and worse
seen so frequently.

Yes, there is repmat() and other tricks to build strings and _sometimes_
one can get clever and even sometimes reversion is all that's needed but
how much simpler it could have been to have used the model already in
existence instead of trying to be too clever by far.

--

Subject: fscanf / sscanf / textscan with fixed format

From: dpb

Date: 13 May, 2012 14:59:46

Message: 19 of 20

On 5/11/2012 3:39 PM, Star Strider wrote:
...

> Here's another way that may also generally address the fixed field width
> problem:
> i3 = sscanf(' 2 456', '%c');
> i3n = str2num([i3(1:2); i3(3:4); i3(5:6)])';

Bingo! :)

See my note elsewhere to Doug pointing out precisely the same workaround
and that often the question/problem that arise w/ fixed-width fields and
sscanf() and friends are causing is disguised as a substring issue...

--

Subject: fscanf / sscanf / textscan with fixed format

From: Star Strider

Date: 13 May, 2012 21:54:08

Message: 20 of 20

dpb <none@non.net> wrote in message <jooi91$5jg$1@speranza.aioe.org>...
> On 5/11/2012 3:39 PM, Star Strider wrote:
> ...
>
> > Here's another way that may also generally address the fixed field width
> > problem:
> > i3 = sscanf(' 2 456', '%c');
> > i3n = str2num([i3(1:2); i3(3:4); i3(5:6)])';
>
> Bingo! :)
>
> See my note elsewhere to Doug pointing out precisely the same workaround
> and that often the question/problem that arise w/ fixed-width fields and
> sscanf() and friends are causing is disguised as a substring issue...
>
> --

-----------------------------------------

I discovered yet another workaround:

i3 = sscanf(' 2 456', '%2c', [2,2]);
i3n = str2num(i3')'

I missed some of that discussion while I was experimenting with the previous solution I posted, similar to one I've used to solve fixed-format problems in the past. I too started in FORTRAN (although I haven't programmed in FORTRAN in a while), and I miss fixed-format, repeat specifiers, and some of the other FORTRAN I/O conventions. They definitely make some I/O easier. It would be nice if TMW implemented them as options somehow.

One item I find frustrating is that

[i3,k] = sscanf(' 2 456', '%2s')

produces i3 as a 1x4 character array [2456], yet k=3, indicating that it read in 3 fields (I assume they are 2-character-length strings) and concatanated them in its output.

Similarly,

[i3,k] = sscanf(' 2 456', '%2c')

produces i3 as a 1x6 character array [ 2 4 56], still giving k=3. I can easily understand that '%2d' could get confused, but not string and character format specifiers '%2c' and '%2s' that ideally would treat even leading whitespaces as legitimate characters if desired.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us