Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Extract numbers from table containing text and numbers

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 1 Feb, 2013 02:40:08

Message: 1 of 30

Hello,

I am trying to extract 2 columns from a file with 425 header lines and then the following:

 Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03 units/alpha/round
 Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+- 0.39412E-03 units/alpha/round
 Q (upper-edge --0.3750E-03 AmP )= 0.18384E-01+- 0.38318E-03 units/alpha/round
 Q (upper-edge --0.5000E-03 AmP )= 0.17536E-01+- 0.37040E-03 units/alpha/round
 Q (upper-edge --0.6250E-03 AmP )= 0.16064E-01+- 0.36000E-03 units/alpha/round
 Q (upper-edge --0.7500E-03 AmP )= 0.15888E-01+- 0.35954E-03 units/alpha/round

Here is my attempt, using textscan, and the output I am getting:

>> clear
>> clc
>> fid1=fopen('rqm7yxk1.out','rt');
>> fmt1=[' %s' '%s' '%s%s%f' '%s' '%s' ' %f%s%s' ' %f' '%s'];
>> set_m=textscan(fid1, fmt1,'delimiter',' ','headerlines',425,'CollectOutput',true); %skip 425 header lines and then read in the data
>> fclose(fid1);
>> set_m

set_m =

  Columns 1 through 4

    {1x4 cell} [0x1 double] {0x1 cell} [0x1 double]

  Columns 5 through 7

    {0x2 cell} [0x1 double] {0x1 cell}

>> set_m{1}

ans =

    '' 'E' '(upper-edge' '--0.1250E-03'

Question:
I need the 3 columns that contain the numbers. Only the first 4 columns seem to be working. Is there something that I am missing, in the formatting, for the remaining columns?

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 1 Feb, 2013 03:35:08

Message: 2 of 30

"Stan" wrote in message <kef9u8$seu$1@newscl01ah.mathworks.com>...
> Hello,
>
> I am trying to extract 2 columns from a file with 425 header lines and then the following:
>
> Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03 units/alpha/round
> Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+- 0.39412E-03 units/alpha/round
> Q (upper-edge --0.3750E-03 AmP )= 0.18384E-01+- 0.38318E-03 units/alpha/round
> Q (upper-edge --0.5000E-03 AmP )= 0.17536E-01+- 0.37040E-03 units/alpha/round
> Q (upper-edge --0.6250E-03 AmP )= 0.16064E-01+- 0.36000E-03 units/alpha/round
> Q (upper-edge --0.7500E-03 AmP )= 0.15888E-01+- 0.35954E-03 units/alpha/round
>
> Here is my attempt, using textscan, and the output I am getting:
>
> >> clear
> >> clc
> >> fid1=fopen('rqm7yxk1.out','rt');
> >> fmt1=[' %s' '%s' '%s%s%f' '%s' '%s' ' %f%s%s' ' %f' '%s'];
> >> set_m=textscan(fid1, fmt1,'delimiter',' ','headerlines',425,'CollectOutput',true); %skip 425 header lines and then read in the data
> >> fclose(fid1);
> >> set_m
>
> set_m =
>
> Columns 1 through 4
>
> {1x4 cell} [0x1 double] {0x1 cell} [0x1 double]
>
> Columns 5 through 7
>
> {0x2 cell} [0x1 double] {0x1 cell}
>
> >> set_m{1}
>
> ans =
>
> '' 'E' '(upper-edge' '--0.1250E-03'
>
> Question:
> I need the 3 columns that contain the numbers. Only the first 4 columns seem to be working. Is there something that I am missing, in the formatting, for the remaining columns?

There is a typo. It should be:
>> set_m{1}

ans =

    '' 'Q' '(upper-edge' '--0.1250E-03'

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 1 Feb, 2013 16:20:08

Message: 3 of 30

Could it be that the string )= is not being accepted as a string?

I've tried working through this file, but it's just not reading correctly.

Please, your help would be really appreciated.

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 1 Feb, 2013 18:44:15

Message: 4 of 30

On 1/31/2013 8:40 PM, Stan wrote:
> Hello,
>
> I am trying to extract 2 columns from a file with 425 header lines and
> then the following:
>
> Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03
> units/alpha/round
> Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+- 0.39412E-03
...

> Here is my attempt, using textscan, and the output I am getting:
>
...

>>> fmt1=[' %s' '%s' '%s%s%f' '%s' '%s' ' %f%s%s' ' %f' '%s'];
>>> set_m=textscan(fid1, fmt1,'delimiter','
>>> ','headerlines',425,'CollectOutput',true); %skip 425 header lines and
...
> I need the 3 columns that contain the numbers. Only the first 4 columns
> seem to be working. Is there something that I am missing, in the
> formatting, for the remaining columns?

Yeah, the numbers are ill-formed w/ the doubled signs methinks...since
you have a fixed format I'd use string-matching...

I cut'n pasted a sample line into a cell string and at the command line
get...

 >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
ans =
     [-1.2500e-04] [0.0099] [2.8300e-04]
 >>

Salt to suit...

--

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 2 Feb, 2013 02:19:08

Message: 5 of 30

dpb <none@non.net> wrote in message <keh2cd$6al$1@speranza.aioe.org>...
> On 1/31/2013 8:40 PM, Stan wrote:
> I cut'n pasted a sample line into a cell string and at the command line
> get...
>
> >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
> ans =
> [-1.2500e-04] [0.0099] [2.8300e-04]
> >>
>
> Salt to suit...
>
> --

I did this:
>> d = {' Q (upper-edge ........'};
>> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')

ans =
  
     [-1.2500e-04] [0 X 1 double] [0 X 1 double]

------------------------------------X------------------------------------

How did you define the cell string? Did you use something different?

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 2 Feb, 2013 14:29:42

Message: 6 of 30

On 2/1/2013 8:19 PM, Stan wrote:
> dpb <none@non.net> wrote in message <keh2cd$6al$1@speranza.aioe.org>...
>> On 1/31/2013 8:40 PM, Stan wrote:
>> I cut'n pasted a sample line into a cell string and at the command
>> line get...
>>
>> >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
>> ans =
>> [-1.2500e-04] [0.0099] [2.8300e-04]
>> >>
>>
>> Salt to suit...
>>
>> --
>
> I did this:
>>> d = {' Q (upper-edge ........'};
>>> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
>
> ans =
>
> [-1.2500e-04] [0 X 1 double] [0 X 1 double]
>
> ------------------------------------X------------------------------------
>
> How did you define the cell string? Did you use something different?

Nope...just cut'n paste from your prior post...

 >> d={'Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03
units/alpha/round';'Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+-
0.39412E-03 units/alpha/round'};
 >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
ans =
     [-1.2500e-04] [0.0099] [2.8300e-04]
 >>

What release are you using?

--

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 2 Feb, 2013 16:26:08

Message: 7 of 30

^^^^ I'm using release R2012a.

I don't think that this would have made such a big difference though.

Do you think textscan changed recently?

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 2 Feb, 2013 17:50:09

Message: 8 of 30

On 2/2/2013 10:26 AM, Stan wrote:
> ^^^^ I'm using release R2012a.
>
> I don't think that this would have made such a big difference though.
>
> Do you think textscan changed recently?

Wouldn't think so, no...but this is 2012b here so anything's possible.
Guess that raises question of which platform? This is Win32 here...

I don't see why it shouldn't parse just fine unless there's a hidden
character or something causing a mismatch in the comparison...can you
retry using direct cut'n paste at command line and see if symptom
stays/goes away?

--

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 2 Feb, 2013 18:40:41

Message: 9 of 30

On 2/2/2013 8:29 AM, dpb wrote:
> On 2/1/2013 8:19 PM, Stan wrote:
>> dpb <none@non.net> wrote in message <keh2cd$6al$1@speranza.aioe.org>...
>>> On 1/31/2013 8:40 PM, Stan wrote:
>>> I cut'n pasted a sample line into a cell string and at the command
>>> line get...
>>>
>>> >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
>>> ans =
>>> [-1.2500e-04] [0.0099] [2.8300e-04]
>>> >>
>>>
>>> Salt to suit...
>>>
>>> --
>>
>> I did this:
>>>> d = {' Q (upper-edge ........'};
>>>> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
>>
>> ans =
>>
>> [-1.2500e-04] [0 X 1 double] [0 X 1 double]
>>
>> ------------------------------------X------------------------------------
>>
>> How did you define the cell string? Did you use something different?
>
> Nope...just cut'n paste from your prior post...
>
...

And, just to check I copied and pasted your textscan() call above into
command window here and it works just as expected...

 >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
ans =
     [-1.2500e-04] [0.0099] [2.8300e-04]
 >> d={' Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03
units/alpha/round';'Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+-
0.39412E-03 units/alpha/round'};
 >> textscan(d{1},'Q (upper-edge -%f AmP )= %f+- %f %*s')
ans =
     [-1.2500e-04] [0.0099] [2.8300e-04]
 >>

NB I also noted you had a leading space in your source string that I
hadn't so added it--still no problem.

One does now have to begin to question that there may have been a bug
fix in textscan() between the 12a and 12b releases.

If you have official support, might send this one in as real support
question to TMW...unless you can uncover something internal in the
string there that doesn't show up from the posting text...

--

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 2 Feb, 2013 20:02:06

Message: 10 of 30

Okay so I re-tried with a direct copy-paste. It did not work. The leading space, as you mentioned, does not affect it.

I'm on Win 32. Unfortunately, I do not have access to R2012b.

Is there a way to read in that string using some other function (fscanf, etc.)?

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 2 Feb, 2013 23:37:59

Message: 11 of 30

On 2/2/2013 2:02 PM, Stan wrote:
> Okay so I re-tried with a direct copy-paste. It did not work. The
> leading space, as you mentioned, does not affect it.
>
> I'm on Win 32. Unfortunately, I do not have access to R2012b.
>
> Is there a way to read in that string using some other function (fscanf,
> etc.)?

I don't have time at the moment but there's always textread() which has
much of the same functionality as textscan()...it requires a file,
though, although I think I recall your original is from a file so that
shouldn't be a problem.

There's always a way w/ fscanf() other than having to deal w/ the
headerlines explicitly, etc., etc., ... fgetl() and a loop is one way
to do that, of course.

--

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 3 Feb, 2013 09:59:08

Message: 12 of 30

Rather work on fixed-field extraction method. I save a piece of your data in test.txt and here is how I read it back:

>> fid=fopen('test.txt','rt');
>> c=textscan(fid,'%s','Delimiter','\n')

c =

    {6x1 cell}

>> fclose(fid);
>> s=char(c{1})

s =

Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03 units/alpha/round
Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+- 0.39412E-03 units/alpha/round
Q (upper-edge --0.3750E-03 AmP )= 0.18384E-01+- 0.38318E-03 units/alpha/round
Q (upper-edge --0.5000E-03 AmP )= 0.17536E-01+- 0.37040E-03 units/alpha/round
Q (upper-edge --0.6250E-03 AmP )= 0.16064E-01+- 0.36000E-03 units/alpha/round
Q (upper-edge --0.7500E-03 AmP )= 0.15888E-01+- 0.35954E-03 units/alpha/round

>> firstcol = s(:,17:26)

firstcol =

0.1250E-03
0.2500E-03
0.3750E-03
0.5000E-03
0.6250E-03
0.7500E-03

>> data1 = str2num(firstcol)

data1 =

   1.0e-03 *

    0.1250
    0.2500
    0.3750
    0.5000
    0.6250
    0.7500

% Do similar for other columns
% NOTE: textscan is NOT buggy when understand how it works

Bruno

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 3 Feb, 2013 15:00:50

Message: 13 of 30

On 2/3/2013 3:59 AM, Bruno Luong wrote:
...

> % NOTE: textscan is NOT buggy when understand how it works
>

Then why is it behaving differently between 12a and 12b as far as Stan
and I can tell???? Which release/platform do you have installed?

And, parsing by reading the whole thing as a glob then sub-selecting
fixed columns sorta' defeats the whole point, anyways...altho it's
another example of where Fortran-like fixed field FORMAT is superior for
such cases to the C sscanf() and friends.

--

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 3 Feb, 2013 15:07:27

Message: 14 of 30

On 2/2/2013 2:02 PM, Stan wrote:
> Okay so I re-tried with a direct copy-paste. It did not work. The
> leading space, as you mentioned, does not affect it.
>
> I'm on Win 32. Unfortunately, I do not have access to R2012b.
>
> Is there a way to read in that string using some other function (fscanf,
> etc.)?
 >> type stan.txt

Q (upper-edge --0.1250E-03 AmP )= 0.98880E-02+- 0.28300E-03
units/alpha/round
Q (upper-edge --0.2500E-03 AmP )= 0.19648E-01+- 0.39412E-03
units/alpha/round
Q (upper-edge --0.3750E-03 AmP )= 0.18384E-01+- 0.38318E-03
units/alpha/round
Q (upper-edge --0.5000E-03 AmP )= 0.17536E-01+- 0.37040E-03
units/alpha/round
Q (upper-edge --0.6250E-03 AmP )= 0.16064E-01+- 0.36000E-03
units/alpha/round
Q (upper-edge --0.7500E-03 AmP )= 0.15888E-01+- 0.35954E-03
units/alpha/round

 >> [a,b,c]=textread('stan.txt','Q (upper-edge -%f AmP )= %f+- %f %*s')
a =
    1.0e-03 *
    -0.1250
    -0.2500
    -0.3750
    -0.5000
    -0.6250
    -0.7500
b =
     0.0099
     0.0196
     0.0184
     0.0175
     0.0161
     0.0159
c =
    1.0e-03 *
     0.2830
     0.3941
     0.3832
     0.3704
     0.3600
     0.3595
 >>

TEXTREAD(), while having been relegated to "red-haired stepchld" status,
is still highly valuable...

--

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 3 Feb, 2013 15:23:07

Message: 15 of 30

dpb <none@non.net> wrote in message <kelu2o$7qj$1@speranza.aioe.org>...
> On 2/3/2013 3:59 AM, Bruno Luong wrote:
> ...
>
> > % NOTE: textscan is NOT buggy when understand how it works
> >
>
> Then why is it behaving differently between 12a and 12b as far as Stan
> and I can tell????

Since when behave differently is buggy? SUM() function gives different results when it goes to multi-thread, yet no one consider it as buggy as far as I can tell.

>Which release/platform do you have installed?

I have 2012a at work and 2012b at home. At work I tell my team not to upgrade to 2012b.

>
> And, parsing by reading the whole thing as a glob then sub-selecting
> fixed columns sorta' defeats the whole point, anyways...altho it's
> another example of where Fortran-like fixed field FORMAT is superior for
> such cases to the C sscanf() and friends.

It's perfectly fine if you don't like the function because it does work like you expect dpb. But (we are not going over it again) I'm still waiting for an evidence of the so called bug (to me, i.e., a behavior different than what the doc describes).

Bruno

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 3 Feb, 2013 16:06:28

Message: 16 of 30

On 2/3/2013 9:23 AM, Bruno Luong wrote:
...

...

> ... I'm still
> waiting for an evidence of the so called bug (to me, i.e., a behavior
> different than what the doc describes).

It's in the official bug report (link provided previously)...

--

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 3 Feb, 2013 16:24:04

Message: 17 of 30

On 2/3/2013 9:07 AM, dpb wrote:
...

> TEXTREAD(), while having been relegated to "red-haired stepchld" status,
> is still highly valuable...
>

OBTW, I also checked on old release R12 (prior to textscan) and
textread() works as expected there as well so it should also work under
your 12a release...

--

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 3 Feb, 2013 16:32:50

Message: 18 of 30

On 2/3/2013 9:23 AM, Bruno Luong wrote:
> dpb <none@non.net> wrote in message <kelu2o$7qj$1@speranza.aioe.org>...
>> On 2/3/2013 3:59 AM, Bruno Luong wrote:
>> ...
>>
>> > % NOTE: textscan is NOT buggy when understand how it works
>> >
>>
>> Then why is it behaving differently between 12a and 12b as far as Stan
>> and I can tell????
>
> Since when behave differently is buggy? SUM() function gives different
> results when it goes to multi-thread, yet no one consider it as buggy as
> far as I can tell.

Well, it's possible it's a bug; w/ multi-thread more probable it's
simply order-dependent and then a figment of "processor-dependent"
behavior as language Standards put it...

Either way, what is/isn't a bug in another function isn't germane to
another particular function.

>> Which release/platform do you have installed?
>
> I have 2012a at work and 2012b at home. At work I tell my team not to
> upgrade to 2012b.
...

Can you parse the three columns in 12a in "one swell foop" or at least
confirm Stan's behavior w/ the specific format string?

> It's perfectly fine if you don't like the function because it does work
> like you expect dpb. But (we are not going over it again) I'm still
> waiting for an evidence of the so called bug (to me, i.e., a behavior
> different than what the doc describes).

I've never said I "don't like" textscan() -- on the contrary the
facility is good/needed. What I have said repeatedly I don't like is
that TMW has since relegated textread() to 2nd-citizen status which
removes another very useful facility that isn't available in
textscan()--namely returning base arrays instead of cells.

As far as this thing about documentation; I defy you to find a reading
of the documentation that is supplied that indicates that the format
string Stan used should behave as it does in R2012a. It appears that
particular bug underlying that behavior has been fixed in R2012b since
we get different results between the two. The previous bug we discussed
is, as noted, in the TMW official bug database awaiting repair in its
own time.

--

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 3 Feb, 2013 18:02:07

Message: 19 of 30

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message
> I have 2012a at work and 2012b at home. At work I tell my team not to upgrade to 2012b.
>

Why? What is the problem with it?

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 3 Feb, 2013 19:51:07

Message: 20 of 30

"Stan" wrote in message <kem8mv$mnu$1@newscl01ah.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message
> > I have 2012a at work and 2012b at home. At work I tell my team not to upgrade to 2012b.
> >
>
> Why? What is the problem with it?

The help/document is a step back, hopefully a temporary situation. Try for example finding a list of mx??? functions to be used in MEX files.

The user interface is inspired by MS office, is not suitable for programmers.

These two factors would reduce the productivity of our group.

Bruno

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 4 Feb, 2013 19:45:08

Message: 21 of 30

dpb <none@non.net> wrote in message <kem3f8$psl$1@speranza.aioe.org>...
>
> As far as this thing about documentation; I defy you to find a reading
> of the documentation that is supplied that indicates that the format
> string Stan used should behave as it does in R2012a.

I just have a little bit of time to take the textscan with Stan's example in 2012A, and indeed the difficulty is clearly due to parsing the "+-" just after a number ("...number+-...").

Don't forget that MATLAB supposes to able to parse complex number such as "1+1i" as well, so the "+-" does not facilitate the parser. The presence of "--" is also not very nice.

So OK it doesn't handle that case well as it supposes, I admit it's a bug.

But Stan shouldn't be proud to create such nasty string at first.

Bruno

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 4 Feb, 2013 21:57:47

Message: 22 of 30

On 2/4/2013 1:45 PM, Bruno Luong wrote:
> dpb <none@non.net> wrote in message <kem3f8$psl$1@speranza.aioe.org>...
>>
>> As far as this thing about documentation; I defy you to find a reading
>> of the documentation that is supplied that indicates that the format
>> string Stan used should behave as it does in R2012a.
>
> I just have a little bit of time to take the textscan with Stan's
> example in 2012A, and indeed the difficulty is clearly due to parsing
> the "+-" just after a number ("...number+-...").
 >
> Don't forget that MATLAB supposes to able to parse complex number such
> as "1+1i" as well, so the "+-" does not facilitate the parser. The
> presence of "--" is also not very nice.
> So OK it doesn't handle that case well as it supposes, I admit it's a bug.

Aha! Houston, we have liftoff!!! <VBG>

I'll note that parsing a complex input really is immaterial to the
bug--the bug is that the string-matching of the explicit string to
ignore for the conversion that includes the first minus in the repeated
substring "--" doesn't function correctly in 2012a (but does in 2012b).
  That is, the fmt string that fails includes the characters up to and
including the first '-' so that the %f should start processing w/ the
second one which is a valid portion of the value. One might specsulate
that the problem was the location of the end match in the target string
to the format string was used as the location for the next scan instead
of incrementing to the next character to start the next field parsing.

textscan() is very complex; it's not surprising given how recently it
has been introduced it still has warts. Also not terribly surprising is
that textscan() handled the same format string correctly since it's had
a lot longer time to get such nits taken care of...

I wish for two things from TMW that would aid formatted text inputting
greatly--

a) A set of functions (or an alternate format flag for the existing
ones, maybe) that use Fortran-like FORMAT expressions vectorized in the
same manner as are the C Xscanf() formatting strings. This would solve
many problems the most common of which is that of fixed-width input
formats and would have solved our conundrum previous disagreement on
what textscan() does differently than the scanf() family for a
fixed-width decimal field that wasn't parsed correctly.

A second major benefit of FORMAT string form over C form is that it
allows for repeat fields and field reversion that would obviate the need
for the butt-ugly and pita repmat() foolishness to get multiple fields.

b) Raise textread() back to fully-supported status again including
keeping its options up to par with those of textscan() and friends. The
loss of a way to read data into native arrays instead of cells is a
major step backwards in functionality even given that the ability to mix
string and numeric data into cell arrays via textscan() is
_a_good_thing_ (tm).

I'd like to see an enhancement to textread to also allow it to combine
like consecutive fields into a single array similar to the
'collectoutput' for textscan. Another way would possibly be for
textscan() to allow for requesting that data be returned as native
arrays as another optional flag/parameter value. That may not be wise
given the complexity that must reside internally already, but it's a
thought if TMW thinks keeping the two up simultaneously is too much
effort. Altho one would think there should be a great deal of
duplication of function in the two given how similar they are in
abilities excepting for the mixed string/numeric enhancement to textscan().

> But Stan shouldn't be proud to create such nasty string at first.
...

I give Stan a complete pass on this. Clearly it's output from another
program he's reading on which to do further post-processing--not
something made up as an input string w/ the idea of reading it.

--

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 5 Feb, 2013 01:04:46

Message: 23 of 30

On 2/4/2013 3:57 PM, dpb wrote:
...

> textscan() is very complex; it's not surprising given how recently it
> has been introduced it still has warts. Also not terribly surprising is
> that textscan() handled the same format string correctly since it's had
> a lot longer time to get such nits taken care of...

The second textscan() above was intended to be textread(), of course...

...

>> But Stan shouldn't be proud to create such nasty string at first.
> ...
>
> I give Stan a complete pass on this. Clearly it's output from another
> program he's reading on which to do further post-processing--not
> something made up as an input string w/ the idea of reading it.

But, whatever the format, it shouldn't be hard to parse in a
general-purpose programming language is just a given imo.

The killer w/ C's i/o functions is their complete inability to
"understand" fix-width fields--it's simply absurd that one can't read
the equivalent of a Fortran FORMAT(5I1) w/ a record of '101 1' and
reliably get 1,0,1,0,1 returned in appropriate variables. In Matlab
(and C) that's nearly impossible w/o special handling.

--

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 5 Feb, 2013 06:55:08

Message: 24 of 30

dpb <none@non.net> wrote in message <keplqs$he2$1@speranza.aioe.org>...

>
> The killer w/ C's i/o functions is their complete inability to
> "understand" fix-width fields--it's simply absurd that one can't read
> the equivalent of a Fortran FORMAT(5I1) w/ a record of '101 1' and
> reliably get 1,0,1,0,1 returned in appropriate variables. In Matlab
> (and C) that's nearly impossible w/o special handling.
>

What's problem? The fixed format reading is just take the right column of the char matrix. OK you have to count the column, but that's really a big deal?

Writing with fixed-format with MATLAB is another matter (there is simply no easy way of doing it).

Bruno

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 5 Feb, 2013 14:17:48

Message: 25 of 30

On 2/5/2013 12:55 AM, Bruno Luong wrote:
> dpb <none@non.net> wrote in message <keplqs$he2$1@speranza.aioe.org>...
>
>>
>> The killer w/ C's i/o functions is their complete inability to
>> "understand" fix-width fields--it's simply absurd that one can't read
>> the equivalent of a Fortran FORMAT(5I1) w/ a record of '101 1' and
>> reliably get 1,0,1,0,1 returned in appropriate variables. In Matlab
>> (and C) that's nearly impossible w/o special handling.
>>
>
> What's problem? The fixed format reading is just take the right column
> of the char matrix. OK you have to count the column, but that's really a
> big deal?

The problem is there's no way to write a format string that will parse
the above example input string correctly w/o actually doing the
character manipulation directly.

Yes, that's a big deal for large files in terms of overhead plus the
necessity of having to write special code to handle it when it should be
a trivial formatting operation.

It's a remnant of the design of C that didn't really consider i/o to be
terribly important it appears...

> Writing with fixed-format with MATLAB is another matter (there is simply
> no easy way of doing it).

I keep hearing this, but on output the fixed-width fields do work at
least reasonably well--there have been several threads on the subject I
can recall over the last several months or so but I can't remember a one
that wasn't solvable pretty easily w/ just the proper formatting strings.

Output works reasonably well; fixed-width non-delimited input is just
broken (by C Library definition which underlies the Matlab formatted i/o
implementation).

--

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 5 Feb, 2013 14:50:12

Message: 26 of 30

dpb <none@non.net> wrote in message <ker49o$r10$1@speranza.aioe.org>...
> On 2/5/2013 12:55 AM, Bruno Luong wrote:
> > dpb <none@non.net> wrote in message <keplqs$he2$1@speranza.aioe.org>...
> >
> >>
> >> The killer w/ C's i/o functions is their complete inability to
> >> "understand" fix-width fields--it's simply absurd that one can't read
> >> the equivalent of a Fortran FORMAT(5I1) w/ a record of '101 1' and
> >> reliably get 1,0,1,0,1 returned in appropriate variables. In Matlab
> >> (and C) that's nearly impossible w/o special handling.
> >>
> >
> > What's problem? The fixed format reading is just take the right column
> > of the char matrix. OK you have to count the column, but that's really a
> > big deal?
>
> The problem is there's no way to write a format string that will parse
> the above example input string correctly w/o actually doing the
> character manipulation directly.
>
> Yes, that's a big deal for large files in terms of overhead plus the
> necessity of having to write special code to handle it when it should be
> a trivial formatting operation.

I consider it is a relative easy task to write the wraparound code that does fixed format, may be a fortran like. May be one of the fortran users might be volunteer to do it and post in the FEX rather than waiting for TMW to do it for you guys (I bet it will never happen).

Bruno

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 5 Feb, 2013 15:43:37

Message: 27 of 30

On 2/5/2013 8:50 AM, Bruno Luong wrote:
...

> I consider it is a relative easy task to write the wraparound code that
> does fixed format, may be a fortran like. May be one of the fortran
> users might be volunteer to do it and post in the FEX rather than
> waiting for TMW to do it for you guys (I bet it will never happen).

Well, it is for the experienced (a relatively simple task, that is), but
the question is "why do you have to do so?" instead of there being the
facility in the base formatted i/o functions? Plus, the time taken for
doing that could be used for other more useful things.

I have had in the past some mex files that did precisely that--pass
FORMAT strings to Fortran to handle some i/o. Unfortunately, the source
seems to have been lost on the machine at a former place of employment
as I haven't been able to find it here. The question came up not very
long ago w/ a poster who had the identical question regarding an input
string similar to the example. It is conceptually quite simple; in
practice as I recall it took some effort to make it general-enough to be
of much generic use.

At this point in my life I'm not sure I'll ever have the ambition to
actually do that much "real" coding again--I've simply lost the
ambition/drive to write code at 2AM like a young'un and am too involved
in the farming operation and other activities at the local level to have
the time for more than just poking around at cssm for a little while in
the mornings to sorta' halfway keep my hand in...

But, that TMW doesn't do it doesn't mean it's not a good idea to include
in Matlab. :)

Since TMW has been kind enough to me recently to provide a license for
R2012b for evaluation/comments, perhaps I will try to build a coherent
set of enhancement requests, though...

It's just a shame that K&R didn't follow the clearly better path already
laid out instead of having to reinvent a (slightly out-of-round) wheel
and so we have to suffer for it...

--

Subject: Extract numbers from table containing text and numbers

From: Bruno Luong

Date: 5 Feb, 2013 16:08:12

Message: 28 of 30

dpb <none@non.net> wrote in message <ker9aj$df0$1@speranza.aioe.org>...

>
> Since TMW has been kind enough to me recently to provide a license for
> R2012b for evaluation/comments,

Excellent, very good offer from TMW.

Bruno

Subject: Extract numbers from table containing text and numbers

From: dpb

Date: 5 Feb, 2013 18:35:32

Message: 29 of 30

On 2/5/2013 10:08 AM, Bruno Luong wrote:
> dpb <none@non.net> wrote in message <ker9aj$df0$1@speranza.aioe.org>...
>
>>
>> Since TMW has been kind enough to me recently to provide a license for
>> R2012b for evaluation/comments,
>
> Excellent, very good offer from TMW.
>

Altho it's turning out not to be the boon I had hoped for...it brings my
old machine almost completely to its knees so that response time is so
poor as to make it unusable for any real work... :(

I do the occasional test like Stan's case where I can't in my old R12
release but it's just not feasible to use it for anything else. I do
have some additional memory on order to see if that will help...

--

Subject: Extract numbers from table containing text and numbers

From: Stan

Date: 7 Feb, 2013 22:59:09

Message: 30 of 30

Update:

Thanks a lot for all the help.

Both dpb and Bruno Lung offered methods that worked. I have to make minor adjustments but their suggestions solved the problem.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us