Having regular expression stop at the first instance of a string, and not subsequent ones
52 views (last 30 days)
Show older comments
Hello!
I'm trying to find a solution to my problem, and I am not sure if there is one.
I have some Elmer code I am trying to parse using regular expressions in Matlab, that looks something like this:
...
Body 2 Target Bodies(1) = 2 Name = "Body 2" Equation = 1 Material = 3 End
Body 3 Target Bodies(1) = 3 Name = "Body 3" Equation = 2 Material = 1 Body Force = 1 End ...
My expression is : Body\s+N.*?(Body\sForce\s*=\s*)(\d+?)\s.*?End
where N is 2 or 3.
I am trying to pull the Body Force from the Body if there is one, and have the token return an empty string if there isn't one.
When I apply this with N = 3, it works fine. However, when N=2, the regexp gives back all of the text, so both body 2 and body 3. Is there a way to specifically tell it to stop at the first End it sees? Thank you very much!
Cheng
0 Comments
Answers (2)
Cedric
on 19 Jan 2013
Edited: Cedric
on 20 Jan 2013
>> doc regexp
Under Command options
'once' : Return only the first match found.
EDIT (after Walter's comment):
>> str = 'Body 2 Target Bodies(1) = 2 Name = "Body 2" Equation = 1 Material = 3 End Body 3 Target Bodies(1) = 3 Name = "Body 3" Equation = 2 Material = 1 Body Force = 1 End Body 44 Target Bodies(1) = 2 Name = "Body 44" Equation = 1 Material = 3 End Body 345 Target Bodies(12) = 36 Name = "Body 345" Equation = 22 Material = 123456 Body Force = 987 End' ;
>> fmt = '(?<=Body %d(((?!End).)+)e \\= )\\d*' ;
>> regexp(str, sprintf( fmt, 28), 'match', 'once' )
ans = ''
>> regexp(str, sprintf( fmt, 2), 'match', 'once' )
ans = ''
>> regexp(str, sprintf( fmt, 3), 'match', 'once' )
ans = '1'
>> regexp(str, sprintf( fmt, 44), 'match', 'once' )
ans = ''
>> regexp(str, sprintf( fmt, 345), 'match', 'once' )
ans = '987'
I'll get some aspirin now ;)
Cedric
2 Comments
Walter Roberson
on 19 Jan 2013
That will not solve the problem here as regexp are "greedy" by default so the .* will go as far as possible.
Walter Roberson
on 18 Jan 2013
Are both lines in the same string, or is it one line at a time? If it is one line at a time, then regexp for '(?<=Body\sForce\s+=\s+)(\d+)'
3 Comments
Walter Roberson
on 19 Jan 2013
Use the lazy quantifier .*? instead of .* which is the greedy quantifier.
Walter Roberson
on 20 Jan 2013
Also, you can set the 'dotexceptnewline' regexp() option so that the .* will not cross linefeeds. In general when you start matching within individual lines you often end up also wanting the 'lineanchors' regexp() option, so that you can use ^ and $ to match the beginning and end of individual lines.
See Also
Categories
Find more on Workspace Variables and MAT-Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!