regexp - match regular expression question

10 views (last 30 days)
Hi all,
In the Matlab 'help' documents for the function called regexp, I'm trying to understand the what the vertical line ( ie. | ) means in the pattern layout below. The example below comes directly from Matlab's help area .... after typing 'help regexp'.
The help documentation indicates:
"|" means Match subexpression before or after the "|"
What I would like to ask is. What does the above mean exactly? At the moment, I'm thinking 'which is it?' .... I was expecting that a match would either be 'before', or it would be 'after'.... but not both before OR after. But even if it really means 'match before OR after', what does that mean exactly? For example, what does "|" actually represent?
Thanks in advance.
str = 'John Davis; Rogers, James';
pat = '(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)';
n = regexp(str, pat, 'names')
  2 Comments
Stephen23
Stephen23 on 30 Sep 2016
The | is an exclusive or. Here is an example of how it works, tested on a string with four slightly different "words":
>> regexp('a123z a%%%z a1%3z a__z','a(\d+|%+)z','match')
ans =
'a123z' 'a%%%z'
The pattern matches all sequences starting with a, ending with z, and containing XOR(digits,%-symbols). The third "word" in the string does not match this because it contains both digits and %-smbols, the fourth contains only underscore, so also does not match the regex. Now lets alter the regex and use two |, to give XOR(digits,%-symbols,underscores):
>> regexp('a123z,a%%%z,a1%3z,a__z','a(\d+|%+|_+)z','match')
ans =
'a123z' 'a%%%z' 'a__z'
Bonus if you want a convenient way to test and experiment with regular expressions, you can try my FEX submission:
Kenny
Kenny on 30 Sep 2016
Edited: Kenny on 1 Oct 2016
Hi Stephen !! Thanks for going out of your way to help me as well. The example that you gave is truly excellent. Thanks very much for showing this. The regexp function is so powerful, but it helps a great deal when you and S.S. add great understandable examples. When I first looked at those 'code' patterns from inbuilt examples, it didn't have the nice explanations that allowed followers to follow through, and understand. Thanks for mentioning XOR, and the bonus link too! Best regards! Thanks a lot again. Kenny

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 30 Sep 2016
Edited: Star Strider on 30 Sep 2016
When I’ve used the ‘|’ (‘or’) operator, I’ve used it to match either of the two (or more) sub-expressions in the expression string. In this instance, if it detects a comma it labels the first string as the last name and the second expression as the first name. If it does not detect a comma, it does the reverse. The presence or absence of a comma in the target string determines which sub-expression will return the result, because the target string with a comma will return an empty value for the sub-expression without a comma, and the reverse is true for the other sub-expression.
If you want to see how this works in practice, try it with only one sub-expression (and without the ‘|’ operator). That’s the easiest (and most instructive) way to see how a particular syntax works.
EDIT Clarified an ambiguity in the original.
  2 Comments
Kenny
Kenny on 30 Sep 2016
Thanks so much for your help and time S.S. ! That helped me a lot tremendously. Thanks for helping me. Genuinely appreciated S.S.

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!