Regexp dynamically naming tokens issue

4 views (last 30 days)
I'm trying to parse some text using named tokens, but I've run into an issue.
Running the following code does exactly what I expect.
test = '<Field name="test1">data1<Field name="test2">data2</Field>';
format = '<Field name="([^"]+)">(?<test1>[^<>]*)</Field>';
output = regexp(test,format,'names')
output =
test1: 'data2'
Now I'd like to have it collect all fields with a single regular expression. I tried many different formats (one example is replacing (?<test1>[^<>]*) with ((?<$1>[^<>]*)) ), but I'm working with the following since I thought it would be easy to diagnose: (??@cat(2,''(?<'',$1,''>[^<>]*)''))
test = '<Field name="test1">data1<Field name="test2">data2</Field>';
format = '<Field name="([^"]+)">(??@cat(2,''(?<'',$1,''>[^<>]*)''))</Field>';
output = regexp(test,format,'names')
output =
0x0 struct array with no fields.
I've been assuming that I was just generating the string incorrectly, but outputting to console results in:
test = '<Field name="test1">data1<Field name="test2">data2</Field>';
format = '<Field name="([^"]+)">(?@cat(2,''(?<'',$1,''>[^<>]*)''))</Field>';
output = regexp(test,format,'names')
ans =
(?<test1>[^<>]*)
ans =
(?<test2>[^<>]*)
output =
0x0 struct array with no fields.
I'm generating what I intended with the dynamic expression, but something obviously isn't working. It's odd that replacing the dynamic portion of my regular expression with what it evaluates to works, but the dynamic expression doesn't work.
Any help would be appreciated. I'm running MATLAB R2015b if it helps.
Thanks, Garrett

Accepted Answer

Stephen23
Stephen23 on 24 Jan 2016
Edited: Stephen23 on 24 Jan 2016
It might be possible to solve this using dynamic regular expressions, but I think in the interest of code clarity it would be best to solve this task outside of regexp. Doing so will make the code intent clearer, make debugging easier, and probably run a bit faster (dynamic expressions get evaluated).
Something like this:
test = '<Field name="test1">data1<Field name="test2">data2</Field>';
fmt = '<Field name="(\w+)">([^<>]*)';
C = regexp(test,fmt,'tokens');
C = vertcat(C{:})';
S = struct(C{:})
gives
>> S
S =
test1: 'data1'
test2: 'data2'
Tip
Developing regular expressions can be a challenge, you might like to try my Regular Expression Helper tool, which is an interactive figure that parses a regular expression as you type and gives immediate feedback on regexp's outputs:
  1 Comment
Garrett
Garrett on 24 Jan 2016
Interesting, I wouldn't have thought to do it this way, but it works! Thanks :)

Sign in to comment.

More Answers (0)

Categories

Find more on Structures in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!