Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
dynamic regexp

Subject: dynamic regexp

From: Greg Thom

Date: 25 Mar, 2010 17:21:07

Message: 1 of 10

This one has beaten the shit out of me for hours so I post here:

I have strings of the form:

str1 = 'Test-Results: ,Pat>006025,,,4,1,3.5400,2,-0.2542,3,9.1200,4,-5.2000,3,';
str2 = 'Test-Results: ,Pat>98656,,,3,1,3.5400,2,-0.2542,3,9.1200,3,';
str3 = 'Test-Results: ,Pat>6300025,,,2,1,3.5400,2,-0.2542,2,';

Now as you can see these strings have some format that changes:

The format is always

'Test-Results: ' then 'Pat>PatID' then ,,, then NumPts then P then Pt1,Val1,Pt2,Val2 etc till PtN,ValN then NumGoodPts, at the end of the string.

I am making a dynamic regexp that will search these strings and give me as tokens

PatID
Numpts
Pt1
Val1
...
PtN
ValN
NumGoodPts

The problem I have is that Pt1,Val1,...,PtN,ValN is variable (N = Numpts for a each line) so I need to dynamically modify the regexp as I go using the value I capture with Numpts. Now I can capture the value no problem.

now these variable part of the string is generalized to
(\d,\[-+]?[0-9]*\.?[0-9]+) ( to clarify [-+]?[0-9]*\.?[0-9]+ captures a float with optional sign and decimal point)


Because I have that format I need to repmat it Numpts times during regexp execution to capture all Pt1,Val1,...,PtN,ValN. eg for str1 I need to
repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,Numpts)
or using regexp
(??@repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,$2)) % NumPts has been captured in token #2.

Does anyone know how to write this properly, I have tried but failed in make repmat execute during regexp execution.

Best Wishes


GT

Subject: dynamic regexp

From: dpb

Date: 25 Mar, 2010 17:28:34

Message: 2 of 10

Greg Thom wrote:
...

> The problem I have is that Pt1,Val1,...,PtN,ValN is variable (N = Numpts
> for a each line) so I need to dynamically modify the regexp as I go
> using the value I capture with Numpts. Now I can capture the value no
> problem.
...

> Does anyone know how to write this properly, I have tried but failed in
> make repmat execute during regexp execution.
...

I "know nuthink" (in any depth, at least) of regexp and absolutely
nothing of the dialect in later versions of ML so I'll venture forth with--

Why not do a two pass-solution? Get the number, build/rebuild the parse
expression and give a second go w/ that...

--

Subject: dynamic regexp

From: us

Date: 25 Mar, 2010 17:37:05

Message: 3 of 10

"Greg Thom" <gregthom99@yahoo.com> wrote in message <hog623$g6v$1@fred.mathworks.com>...
> This one has beaten the shit out of me for hours so I post here:
>
> I have strings of the form:
>
> str1 = 'Test-Results: ,Pat>006025,,,4,1,3.5400,2,-0.2542,3,9.1200,4,-5.2000,3,';
> str2 = 'Test-Results: ,Pat>98656,,,3,1,3.5400,2,-0.2542,3,9.1200,3,';
> str3 = 'Test-Results: ,Pat>6300025,,,2,1,3.5400,2,-0.2542,2,';
>
> Now as you can see these strings have some format that changes:
>
> The format is always
>
> 'Test-Results: ' then 'Pat>PatID' then ,,, then NumPts then P then Pt1,Val1,Pt2,Val2 etc till PtN,ValN then NumGoodPts, at the end of the string.

a hint:
- don't use REGEXP for this particular problem...
- rather, look at

     help sscanf;
     help strread;

us

Subject: dynamic regexp

From: us

Date: 25 Mar, 2010 17:57:04

Message: 4 of 10

"us " <us@neurol.unizh.ch> wrote in message <hog701$2kq$1@fred.mathworks.com>...
> "Greg Thom" <gregthom99@yahoo.com> wrote in message <hog623$g6v$1@fred.mathworks.com>...
> > This one has beaten the shit out of me for hours so I post here:
> >
> > I have strings of the form:
> >
> > str1 = 'Test-Results: ,Pat>006025,,,4,1,3.5400,2,-0.2542,3,9.1200,4,-5.2000,3,';
> > str2 = 'Test-Results: ,Pat>98656,,,3,1,3.5400,2,-0.2542,3,9.1200,3,';
> > str3 = 'Test-Results: ,Pat>6300025,,,2,1,3.5400,2,-0.2542,2,';
> >
> > Now as you can see these strings have some format that changes:
> >
> > The format is always
> >
> > 'Test-Results: ' then 'Pat>PatID' then ,,, then NumPts then P then Pt1,Val1,Pt2,Val2 etc till PtN,ValN then NumGoodPts, at the end of the string.
>
> a hint:
> - don't use REGEXP for this particular problem...
> - rather, look at
>
> help sscanf;
> help strread;
>
> us

one of the many solutions

     s={
          'Test-Results: ,Pat>006025,,,4,1,3.5400,2,-0.2542,3,9.1200,4,-5.2000,3,'
          'Test-Results: ,Pat>98656,,,3,1,3.5400,2,-0.2542,3,9.1200,3,'
          'Test-Results: ,Pat>6300025,,,2,1,3.5400,2,-0.2542,2,'
     };
     s=strrep(s,'Test-Results: ,Pat>',' ');
     s=strrep(s,',',' ');
     r=cellfun(@(x) sscanf(x,'%f'),s,'uni',false);
     r
%{
% r =
     [11x1 double]
      [ 9x1 double]
      [ 7x1 double]
%}
% eg,
     r{1}.'
%{
          6025 4 1 3.54 2 -0.2542 3 9.12 4 -5.2 3
%}

us

Subject: dynamic regexp

From: Jason Breslau

Date: 25 Mar, 2010 18:03:12

Message: 5 of 10

A limitation of the current implementation of regexp is that dynamic
patterns may not capture tokens.

-=>J

Subject: dynamic regexp

From: Walter Roberson

Date: 25 Mar, 2010 18:36:10

Message: 6 of 10

Greg Thom wrote:

> Because I have that format I need to repmat it Numpts times during
> regexp execution to capture all Pt1,Val1,...,PtN,ValN. eg for str1 I
> need to repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,Numpts)
> or using regexp
> (??@repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,$2)) % NumPts has been captured
> in token #2.


Because you want the Pt to be separated from the Val, you want () around the
distinct parts. And you've let a \ creep in before the [-+] which would cause
it to look for literal [ characters . And if there is no decimal point, then
the [0-9]+ at the end, being forced to match one character at least, is going
to force the [0-9]* (which would have matched the entire number) to "back up"
by one position, which is inefficient. You also haven't allowed for the comma
after the Val when you did the repmat. [0-9] is shorter as \d . And since this
is presumably within a quoted string, you have to double the ' in order not to
end the overall string when you start the string you want to repmat.

Also, below I won't assume that NumPts is at most 9:

(??@repmat(''(\d+),([-+]?\d*\.?\d*),'', 1, $2))

There is a _possibility_ that you might have to double the \ but I suspect
not... well, maybe in some circumstances in front of digits or token names.

Subject: dynamic regexp

From: Greg Thom

Date: 25 Mar, 2010 20:34:04

Message: 7 of 10

Walter Roberson <roberson@hushmail.com> wrote in message <hogaes$sp$1@canopus.cc.umanitoba.ca>...
> Greg Thom wrote:
>
> > Because I have that format I need to repmat it Numpts times during
> > regexp execution to capture all Pt1,Val1,...,PtN,ValN. eg for str1 I
> > need to repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,Numpts)
> > or using regexp
> > (??@repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,$2)) % NumPts has been captured
> > in token #2.
>
>
> Because you want the Pt to be separated from the Val, you want () around the
> distinct parts. And you've let a \ creep in before the [-+] which would cause
> it to look for literal [ characters . And if there is no decimal point, then
> the [0-9]+ at the end, being forced to match one character at least, is going
> to force the [0-9]* (which would have matched the entire number) to "back up"
> by one position, which is inefficient. You also haven't allowed for the comma
> after the Val when you did the repmat. [0-9] is shorter as \d . And since this
> is presumably within a quoted string, you have to double the ' in order not to
> end the overall string when you start the string you want to repmat.
>
> Also, below I won't assume that NumPts is at most 9:
>
> (??@repmat(''(\d+),([-+]?\d*\.?\d*),'', 1, $2))
>
> There is a _possibility_ that you might have to double the \ but I suspect
> not... well, maybe in some circumstances in front of digits or token names.


Hello all, thanks for the suggestions, I don't know but I thought regexp can solve this problem especially with the dynamic regexp so I tried walter's suggestion

>> [toks mat] = regexp(str1,'Test-Results: ,\w*>\w*,,,(\d+),(??@repmat(''(\d+),([-+]?\d*\.?\d*),'', 1, $1))3,','tokens')
??? Error using ==> regexp
Evaluation of 'repmat('(\d+),([-+]?\d*\.?\d*),', 1, $1)' did not produce a string.

The trick is to get repmat to evaluate properly and produce the right string for ??@ to accept.

Anyone ?

Cheers

Subject: dynamic regexp

From: Greg Thom

Date: 25 Mar, 2010 20:58:04

Message: 8 of 10

"Greg Thom" <gregthom99@yahoo.com> wrote in message <hog623$g6v$1@fred.mathworks.com>...
> This one has beaten the shit out of me for hours so I post here:
>
> I have strings of the form:
>
> str1 = 'Test-Results: ,Pat>006025,,,4,1,3.5400,2,-0.2542,3,9.1200,4,-5.2000,3,';
> str2 = 'Test-Results: ,Pat>98656,,,3,1,3.5400,2,-0.2542,3,9.1200,3,';
> str3 = 'Test-Results: ,Pat>6300025,,,2,1,3.5400,2,-0.2542,2,';
>
> Now as you can see these strings have some format that changes:
>
> The format is always
>
> 'Test-Results: ' then 'Pat>PatID' then ,,, then NumPts then P then Pt1,Val1,Pt2,Val2 etc till PtN,ValN then NumGoodPts, at the end of the string.
>
> I am making a dynamic regexp that will search these strings and give me as tokens
>
> PatID
> Numpts
> Pt1
> Val1
> ...
> PtN
> ValN
> NumGoodPts
>
> The problem I have is that Pt1,Val1,...,PtN,ValN is variable (N = Numpts for a each line) so I need to dynamically modify the regexp as I go using the value I capture with Numpts. Now I can capture the value no problem.
>
> now these variable part of the string is generalized to
> (\d,\[-+]?[0-9]*\.?[0-9]+) ( to clarify [-+]?[0-9]*\.?[0-9]+ captures a float with optional sign and decimal point)
>
>
> Because I have that format I need to repmat it Numpts times during regexp execution to capture all Pt1,Val1,...,PtN,ValN. eg for str1 I need to
> repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,Numpts)
> or using regexp
> (??@repmat('\d,\[-+]?[0-9]*\.?[0-9]+',1,$2)) % NumPts has been captured in token #2.
>
> Does anyone know how to write this properly, I have tried but failed in make repmat execute during regexp execution.
>
> Best Wishes
>
>

update:

corrected the last post with this

[toks mat] = regexp(str1,'Test-Results: ,\w*>\w*,,,(\d+),(??@repmat(''(\d+),([-+]?\d*\.?\d*),'', 1, str2num($1)))3,','tokens')

but can anyone explain why is it returning empty , I know for sure that the repmat is executing correctly , what I don't know is what the final regexp expression looks like, is there anyway to debug and view the actuall regexp string that is executing ?


Cheers

GT
> GT

Subject: dynamic regexp

From: us

Date: 25 Mar, 2010 21:05:07

Message: 9 of 10

"Greg Thom"
> but can anyone explain why is it returning empty , I know for sure that the repmat is executing correctly , what I don't know is what the final regexp expression looks like, is there anyway to debug and view the actuall regexp string that is executing ?

why insisting on REGEXP(?)...
you were shown another solutions, which gets you started right away without wasting any more time...

us

Subject: dynamic regexp

From: Walter Roberson

Date: 25 Mar, 2010 21:55:39

Message: 10 of 10

Greg Thom wrote:

> corrected the last post with this
>
> [toks mat] = regexp(str1,'Test-Results:
> ,\w*>\w*,,,(\d+),(??@repmat(''(\d+),([-+]?\d*\.?\d*),'', 1,
> str2num($1)))3,','tokens')

Oh, good point about using str2num there! (though str2double would be more
efficient, as str2num uses eval.)

> but can anyone explain why is it returning empty , I know for sure that
> the repmat is executing correctly , what I don't know is what the final
> regexp expression looks like, is there anyway to debug and view the
> actuall regexp string that is executing ?

For future reference of anyone who might be following this thread: the repmat
part could be debugged by calling instead a user-written function that
reported on its inputs and then did the repmat.


As to why the regexp is not working: the posting from the fellow from
Mathworks saying that tokens cannot be captured within dynamic regexp patterns
is the key. But if you toss a () around the (??@...) expression, then the
overall pattern matched by the dynamic expression should be returned, all as
one piece. You'd then have to break it up, but that could be done by cellfun
of regexp() with a pattern of ',' and the parameter 'split' .


Okay, let's simplify this whole lot. Since you are going to have to post split
  anyhow, you don't care how many there are on the line or what they look
like. So...

[toks, mat] = regexp(str1, '^.+?,,\d+,(.+),\d+,$', 'tokens');

That is, skip everything until you find two commas followed by a number (the ?
after the .+ makes it a "lazy quantifier"), skip over the number (which is the
count of the number of pairs), skip the comma after that, then capture
everything heading towards the end of line, but back up and skip over the last
comma number comma at the end of line.

Then take the tokens that result and split them at the commas. Unless you have
a corrupted entry, you will automatically get the correct number of pairs.


Okay, now I'm going to make it even more simple, *provided* that the value is
certain to have a decimal point somewhere in it:

pairs = regexp(str1, '(?<pt>\d+),(?<val>[-+]?\d*\.\d*)', 'names');

This will return a structure array with fields pt and val, so
pairs(1).pt = '1'
pairs(1).val = '3.5400'
pairs(2).pt = '2'

and so on.

However, if the value parameter might look just like an integer, then we
cannot use this automatic splitting on str1 as such. But in that case, you could:

leadinlen = regexp(str1, ',,,\d+', 'end');
pairs = regexp(str1(leadinlen+1:end), '(?<pt>\d+),(?<val>[^,]+)', 'names');


I think you'll find this approach much easier than continuing with dynamic
patterns.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us