Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Question Using Regexpi

Subject: Question Using Regexpi

From: Kevin Ellis

Date: 3 Jul, 2013 14:39:16

Message: 1 of 12

Hello,

I have a fairly easy question. I want to match numbers that occur ONLY at the end of a string. For example, I have the following string:

String = 'NR1 MGT - 61 HOURS'

I have written the following code to pull out the number '61' using the '$' to try to match the number at the end of the string:

Match = regexpi(String,'((\d*\x2C\d*\x2E\d*)|(\d*\x2E\d*)|(\d*\x2C\d*)|(\d*))$','match');

This returns an empty cell meaning that nothing could be matched. However, when I remove the '$' it returns the numbers '1' and '61' but I only want it to return the '61' because it occurs at the end of the string. I'm clearly not using the '$' operator correctly and hopefully someone can show me how to do this properly. Thanks for the help.

Kevin

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 15:32:39

Message: 2 of 12

On 7/3/2013 9:39 AM, Kevin Ellis wrote:
...

> I have a fairly easy question. I want to match numbers that occur ONLY
> at the end of a string. For example, I have the following string:
>
> String = 'NR1 MGT - 61 HOURS'
>
> I have written the following code to pull out the number '61' using the
> '$' to try to match the number at the end of the string:
...

But the number _isn't_ at the end of the string--the character string
'HOURS' is at the end.

I'm no regexp user but '$' doesn't return the last occurrence of a match
in a strung but the match _followed_immediately_ by the \n sequence.

You need a search from the end to then find the first full word that is
numeric it would appear. I don't know enough regexp otomh to do it on
the fly, sorry...but as you've discovered '$' isn't that.

--

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 16:15:12

Message: 3 of 12

On 7/3/2013 10:32 AM, dpb wrote:
...

> I'm no regexp user but '$' doesn't return the last occurrence of a match
> in a strung but the match _followed_immediately_ by the \n sequence.
>
> You need a search from the end to then find the first full word that is
> numeric it would appear. I don't know enough regexp otomh to do it on
> the fly, sorry...but as you've discovered '$' isn't that.

One would think that

regexp(String,'<\\d\>','match')

would find a word containing only digits but it doesn't succeed,
either... :(

 >> regexp(String,'<\\d\>','match')
ans =
      {}
 >>

OK, is it something about using '\d' inside the <\ ????

 >> regexp(String,'<\[0-9]\>','match')
ans =
      {}
 >>

No, no joy...

OK, are there some other non-printing characters masquerading as
whitespace or somesuch????

 >> findstr(String,' ')
ans =
      4 8 9 11 12 13 14 17

Nope, blanks on either side so _should_ be a word...

 >> regexp(String,'\d','match')
ans =
     '1' '6' '1'
 >> regexp(String,'[ \d ]','match')
ans =
   Columns 1 through 7
     '1' ' ' ' ' ' ' ' ' ' ' ' '
   Columns 8 through 11
     ' ' '6' '1' ' '

And, finds the individual digits surrounded by blanks just to prove it...

OK, let's just parse the form...

 >> String
String =
NR1 MGT - 61 HOURS
 >> v=sscanf(String(findstr(String,'-')+1:end),'%d')
v =
     61
 >>

Such is why I've always become so frustrated w/ regexp before ever
actually got anything to work pretty much gave up trying for lack of
interest in figuring it out enough to understand why what seems that
should doesn't. More interesting things to do than pore over regexp
documentation that seems nearly impenetrable at first blush... :(

--

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 17:00:47

Message: 4 of 12

On 7/3/2013 10:32 AM, dpb wrote:
...

> I'm no regexp user but '$' doesn't return the last occurrence of a match
> in a strung but the match _followed_immediately_ by the \n sequence.
>
> You need a search from the end to then find the first full word that is
> numeric it would appear. I don't know enough regexp otomh to do it on
> the fly, sorry...but as you've discovered '$' isn't that.

One would think that

regexp(String,'<\\d\>','match')

would find a word containing only digits but it doesn't succeed,
either... :(

 >> regexp(String,'<\\d\>','match')
ans =
      {}
 >>

OK, is it something about using '\d' inside the <\ ????

 >> regexp(String,'<\[0-9]\>','match')
ans =
      {}
 >>

No, no joy...

OK, are there some other non-printing characters masquerading as
whitespace or somesuch????

 >> findstr(String,' ')
ans =
      4 8 9 11 12 13 14 17

Nope, blanks on either side so _should_ be a word...

 >> regexp(String,'\d','match')
ans =
     '1' '6' '1'
 >> regexp(String,'[ \d ]','match')
ans =
   Columns 1 through 7
     '1' ' ' ' ' ' ' ' ' ' ' ' '
   Columns 8 through 11
     ' ' '6' '1' ' '

And, finds the individual digits surrounded by blanks just to prove it...

OK, let's just parse the form...

 >> String
String =
NR1 MGT - 61 HOURS
 >> v=sscanf(String(findstr(String,'-')+1:end),'%d')
v =
     61
 >>

Such is why I've always become so frustrated w/ regexp before ever
actually got anything to work pretty much gave up trying for lack of
interest in figuring it out enough to understand why what seems that
should doesn't. More interesting things to do than pore over regexp
documentation that seems nearly impenetrable at first blush... :(

--

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 17:01:41

Message: 5 of 12

On 7/3/2013 10:32 AM, dpb wrote:
...

> I'm no regexp user but '$' doesn't return the last occurrence of a match
> in a strung but the match _followed_immediately_ by the \n sequence.
>
> You need a search from the end to then find the first full word that is
> numeric it would appear. I don't know enough regexp otomh to do it on
> the fly, sorry...but as you've discovered '$' isn't that.

One would think that

regexp(String,'<\\d\>','match')

would find a word containing only digits but it doesn't succeed,
either... :(

 >> regexp(String,'<\\d\>','match')
ans =
      {}
 >>

OK, is it something about using '\d' inside the <\ ????

 >> regexp(String,'<\[0-9]\>','match')
ans =
      {}
 >>

No, no joy...

OK, are there some other non-printing characters masquerading as
whitespace or somesuch????

 >> findstr(String,' ')
ans =
      4 8 9 11 12 13 14 17

Nope, blanks on either side so _should_ be a word...

 >> regexp(String,'\d','match')
ans =
     '1' '6' '1'
 >> regexp(String,'[ \d ]','match')
ans =
   Columns 1 through 7
     '1' ' ' ' ' ' ' ' ' ' ' ' '
   Columns 8 through 11
     ' ' '6' '1' ' '

And, finds the individual digits surrounded by blanks just to prove it...

OK, let's just parse the form...

 >> String
String =
NR1 MGT - 61 HOURS
 >> v=sscanf(String(findstr(String,'-')+1:end),'%d')
v =
     61
 >>

Such is why I've always become so frustrated w/ regexp before ever
actually got anything to work pretty much gave up trying for lack of
interest in figuring it out enough to understand why what seems that
should doesn't. More interesting things to do than pore over regexp
documentation that seems nearly impenetrable at first blush... :(

--

Subject: Question Using Regexpi

From: Doug Schwarz

Date: 3 Jul, 2013 17:20:07

Message: 6 of 12

In article <kr1ime$b3c$1@speranza.aioe.org>, dpb <none@non.net> wrote:

> On 7/3/2013 10:32 AM, dpb wrote:
> ...
>
> > I'm no regexp user but '$' doesn't return the last occurrence of a match
> > in a strung but the match _followed_immediately_ by the \n sequence.
> >
> > You need a search from the end to then find the first full word that is
> > numeric it would appear. I don't know enough regexp otomh to do it on
> > the fly, sorry...but as you've discovered '$' isn't that.
>
> One would think that
>
> regexp(String,'<\\d\>','match')
>
> would find a word containing only digits but it doesn't succeed,
> either... :(

Your idea is sound, but you have the RE wrong. The beginning of a word
anchor is '\<', not '<\'. Also, to match 1 or more contiguous digits
use '\d+'. So the final expression is '\<\d+\>' which does what you
want.

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 17:52:25

Message: 7 of 12

On 7/3/2013 12:20 PM, Doug Schwarz wrote:
...

> Your idea is sound, but you have the RE wrong. The beginning of a word
> anchor is '\<', not'<\'. Also, to match 1 or more contiguous digits
> use '\d+'. So the final expression is '\<\d+\>' which does what you
> want.
>

Thanks, Doug...I stared at and even thought I pasted from the regexp doc
the form and still couldn't see had the \< reversed. I wasn't aware of
(and didn't see in looking thru the voluminous doc altho I'm sure it's
there if know where to look) the '+' though--would have thought the word
expression would be sufficient.

 >> regexp(String,'\<\d+\>','match')
ans =
     '61'
 >>

And, by golly! it does... :) Now just how easy was that! :)

I've never had the patience nor enough occasion where was forced to
learn the syntax well enough that it isn't always starting over from
near absolute zero every time it might be useful... :) (or :( more
appropriately, maybe...)

--

Subject: Question Using Regexpi

From: Kevin Ellis

Date: 3 Jul, 2013 18:16:17

Message: 8 of 12

dpb <none@non.net> wrote in message <kr1ocm$rln$1@speranza.aioe.org>...
> On 7/3/2013 12:20 PM, Doug Schwarz wrote:
> ...
>
> > Your idea is sound, but you have the RE wrong. The beginning of a word
> > anchor is '\<', not'<\'. Also, to match 1 or more contiguous digits
> > use '\d+'. So the final expression is '\<\d+\>' which does what you
> > want.
> >
>
> Thanks, Doug...I stared at and even thought I pasted from the regexp doc
> the form and still couldn't see had the \< reversed. I wasn't aware of
> (and didn't see in looking thru the voluminous doc altho I'm sure it's
> there if know where to look) the '+' though--would have thought the word
> expression would be sufficient.
>
> >> regexp(String,'\<\d+\>','match')
> ans =
> '61'
> >>
>
> And, by golly! it does... :) Now just how easy was that! :)
>
> I've never had the patience nor enough occasion where was forced to
> learn the syntax well enough that it isn't always starting over from
> near absolute zero every time it might be useful... :) (or :( more
> appropriately, maybe...)
>
> --

Thanks for the help. Both solutions work great. I would have never figured it out without everyone's help. I think regexpi works great for easy applications, but when things become more complicated the more I struggle with it.

Kevin

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 18:40:08

Message: 9 of 12

On 7/3/2013 1:16 PM, Kevin Ellis wrote:
...

>
> Thanks for the help. Both solutions work great. I would have never
> figured it out without everyone's help. I think regexpi works great for
> easy applications, but when things become more complicated the more I
> struggle with it.

RE is a whole other language and one has to learn the syntax to make
effective use of it. As noted, I've not attempted to do so which is
only a remnant of my having not been willing to spend the time to do it
(compounded by what I did learn some of was that built into the old
Brief programmers' editor which is close but not quite the same so
syntactic tidbits I do recall often aren't quite right, either).

The only real complaint against regexp for complicated expressions and
large problem sets is that it can become a bottleneck...

--

Subject: Question Using Regexpi

From: Eric Sampson

Date: 3 Jul, 2013 21:08:08

Message: 10 of 12

dpb <none@non.net> wrote in message <kr1r64$3l9$1@speranza.aioe.org>...
> On 7/3/2013 1:16 PM, Kevin Ellis wrote:
> ...
>
> >
> > Thanks for the help. Both solutions work great. I would have never
> > figured it out without everyone's help. I think regexpi works great for
> > easy applications, but when things become more complicated the more I
> > struggle with it.
>
> RE is a whole other language and one has to learn the syntax to make
> effective use of it. As noted, I've not attempted to do so which is
> only a remnant of my having not been willing to spend the time to do it
> (compounded by what I did learn some of was that built into the old
> Brief programmers' editor which is close but not quite the same so
> syntactic tidbits I do recall often aren't quite right, either).
>
> The only real complaint against regexp for complicated expressions and
> large problem sets is that it can become a bottleneck...
>
> --

Just a note dpb, I seem to recall that the folks who work on MATLAB's regexp engine have made performance improvements over time, and continue to do so. Not something that would make it into the release notes, but more 'behind the scenes' improvements.

Cheers

Subject: Question Using Regexpi

From: dpb

Date: 3 Jul, 2013 21:56:00

Message: 11 of 12

On 7/3/2013 4:08 PM, Eric Sampson wrote:
> dpb <none@non.net> wrote in message <kr1r64$3l9$1@speranza.aioe.org>...
...

>> The only real complaint against regexp for complicated expressions and
>> large problem sets is that it can become a bottleneck...
>>
...

> Just a note dpb, I seem to recall that the folks who work on MATLAB's
> regexp engine have made performance improvements over time, and continue
> to do so. Not something that would make it into the release notes, but
> more 'behind the scenes' improvements.

Undoubtedly but I suspect the previous generalization is still
true...it's just too complicated a piece o' fluff to not be... :)

--

Subject: Question Using Regexpi

From: Michael Ryan

Date: 8 Jul, 2013 14:35:07

Message: 12 of 12

"Kevin Ellis" wrote in message <kr1pph$hln$1@newscl01ah.mathworks.com>...
> Thanks for the help. Both solutions work great. I would have never figured it out without everyone's help. I think regexpi works great for easy applications, but when things become more complicated the more I struggle with it.
>
> Kevin
It's great that your regexp problem was solved. For future questions, there are several items on the File Exchange that may help:
20589 - rex: a pedestrian regular expression operator synopsis generator
15215 - RegexpHelper
40781 - regexpHelper (app version)
41899 - regexpBuilder (Disclaimer: I wrote this one)
All of these should be pretty helpful in determining what your regexp would be.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us