Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

New to MATLAB?

Having regular expression stop at the first instance of a string, and not subsequent ones

Asked by Cheng-Ming

Cheng-Ming (view profile)

on 18 Jan 2013

Hello!

I'm trying to find a solution to my problem, and I am not sure if there is one.

I have some Elmer code I am trying to parse using regular expressions in Matlab, that looks something like this:

...

Body 2 Target Bodies(1) = 2 Name = "Body 2" Equation = 1 Material = 3 End

Body 3 Target Bodies(1) = 3 Name = "Body 3" Equation = 2 Material = 1 Body Force = 1 End ...

My expression is : Body\s+N.*?(Body\sForce\s*=\s*)(\d+?)\s.*?End

where N is 2 or 3.

I am trying to pull the Body Force from the Body if there is one, and have the token return an empty string if there isn't one.

When I apply this with N = 3, it works fine. However, when N=2, the regexp gives back all of the text, so both body 2 and body 3. Is there a way to specifically tell it to stop at the first End it sees? Thank you very much!

Cheng

0 Comments

Cheng-Ming

Cheng-Ming (view profile)

Products

No products are associated with this question.

2 Answers

Answer by Walter Roberson

Walter Roberson (view profile)

on 18 Jan 2013

Are both lines in the same string, or is it one line at a time? If it is one line at a time, then regexp for '(?<=Body\sForce\s+=\s+)(\d+)'

3 Comments

Cheng-Ming

Cheng-Ming (view profile)

on 19 Jan 2013

They are in the same string. I could split them then parse again, that would not be difficult, but I was wondering if there was a more elegant solution.

Walter Roberson

Walter Roberson (view profile)

on 19 Jan 2013

Use the lazy quantifier .*? instead of .* which is the greedy quantifier.

Walter Roberson

Walter Roberson (view profile)

on 20 Jan 2013

Also, you can set the 'dotexceptnewline' regexp() option so that the .* will not cross linefeeds. In general when you start matching within individual lines you often end up also wanting the 'lineanchors' regexp() option, so that you can use ^ and $ to match the beginning and end of individual lines.

Walter Roberson

Walter Roberson (view profile)

Answer by Cedric Wannaz

Cedric Wannaz (view profile)

on 19 Jan 2013
Edited by Cedric Wannaz

Cedric Wannaz (view profile)

on 20 Jan 2013
 >> doc regexp

Under Command options

 'once'  : Return only the first match found.

EDIT (after Walter's comment):

 >> str = 'Body 2 Target Bodies(1) = 2 Name = "Body 2" Equation = 1 Material = 3 End Body 3 Target Bodies(1) = 3 Name = "Body 3" Equation = 2 Material = 1 Body Force = 1 End Body 44 Target Bodies(1) = 2 Name = "Body 44" Equation = 1 Material = 3 End Body 345 Target Bodies(12) = 36 Name = "Body 345" Equation = 22 Material = 123456 Body Force = 987 End' ; 
 >> fmt = '(?<=Body %d(((?!End).)+)e \\= )\\d*' ;
 >> regexp(str, sprintf( fmt, 28), 'match', 'once' )
 ans = ''
 >> regexp(str, sprintf( fmt, 2), 'match', 'once' )
 ans = ''
 >> regexp(str, sprintf( fmt, 3), 'match', 'once' )
 ans = '1'
 >> regexp(str, sprintf( fmt, 44), 'match', 'once' )
 ans = ''
 >> regexp(str, sprintf( fmt, 345), 'match', 'once' )
 ans = '987'

I'll get some aspirin now ;)

Cedric

2 Comments

Walter Roberson

Walter Roberson (view profile)

on 19 Jan 2013

That will not solve the problem here as regexp are "greedy" by default so the .* will go as far as possible.

Cedric Wannaz

Cedric Wannaz (view profile)

on 20 Jan 2013

Ah yes; .. I'll update my answer.

Cedric Wannaz

Cedric Wannaz (view profile)

Contact us