Regular expression help for capturing tokens from a c++ if_else function block

1 view (last 30 days)
I am trying to convert some c++ code matlab code and need help because trying to capture the conditions on the if_else statements. The sample code is posted below, normally it is much longer and contains many sets of the same repeating piecewise constraints with different functions to evaluate.
T[4][0] = if (x <= 2.0E2) {
t4 = x*-7.43939368315E2;
} else {
if (2.0E4 < x) {
t4 = x*-3.15202357052E2;
} else {
if ((2.0E2 < x) && (x <= 1.0E3)) {
t4 = -x*(x*-3.3581574807E-2+log(x)*4.7023733515E1+(log(x)*2.400503067842E3)/x+1.09445880410878E5/x+1.0/(x*x)*6.1158591370349E4+(x*x)*1.9946376054E-5-(x*x*x)*7.445595608E-9+(x*x*x*x)*1.243670758E-12-1.11582896139E2);
} else {
if ((1.0E3 < x) && (x <= 6.0E3)) {
t4 = -x*(x*-2.3265021934E-3+log(x)*4.8602554649E1+(log(x)*1.5974720059903E4)/x+3.623374787802E4/x+1.0/(x*x)*1.897212861710375E6+(x*x)*1.9151215358E-7-(x*x*x)*1.2237095959E-11+(x*x*x*x)*3.95116007E-16-1.62570932876E2);
} else {
if ((6.0E3 < x) && (x <= 2.0E4)) {
t4 = -x*(x*-1.6249864554E-1+log(x)*2.049900452325E3+(log(x)*6.16116287553448E6)/x-4.067298995421662E7/x+1.0/(x*x)*(5.126459124735035E23/1.40737488355328E14)+(x*x)*4.514672712E-6-(x*x*x)*9.025238189E-11+(x*x*x*x)*8.209541318E-16-1.8977498210781E4);
} else {
t4 = NAN;
}
}
}
}
};
T[5][0] = if (x <= 2.0E2) {
t5 = x*-6.99993596709E2;
} else {
if (6.0E3 < x) {
t5 = x*-2.95957496736E2;
} else {
if ((2.0E2 < x) && (x <= 1.0E3)) {
t5 = x*(x*-5.9732340052E-2+log(x)*1.344708739E1+(log(x)*6.623440520686E3)/x-1.30277758259044E5/x+1.0/(x*x)*1.93975305124744E5+(x*x)*2.359195784E-5-(x*x*x)*7.135910089E-9+(x*x*x*x)*1.0512057259E-12-2.8910827165E2);
} else {
if ((1.0E3 < x) && (x <= 6.0E3)) {
t5 = -x*(x*1.8859601673E-5+log(x)*3.6779467978E1+(log(x)*2.62795681346E2)/x+1.11227336090838E5/x-1.0/(x*x)*7.24982080986207E5+(x*x)*4.872002157E-8-(x*x*x)*9.084382373E-12+(x*x*x*x)*6.625791502E-16-4.3668943967E1);
} else {
t5 = NAN;
}
}
}
}
I have tried (and other variations)
' tokens = regexp(funcode,'if\s\((.+)\)\s\{','tokens') '
but it captures the whole segment after the first 'if (' and ends with the last ') {'
I would also like to eventually capture tokens for the expressions for 't4 = ... ' etc with each condition.
Any help would be greatly appreciated. P.S. Matlab needs to make MatlabFunction() work for piecewise symbolic functions.

Accepted Answer

Walter Roberson
Walter Roberson on 12 Jul 2011
Your most immediate problem is that .+ captures as many characters as possible and then backtracks only as much as is necessary to match the rest of the expression. If you use .+? then that will capture only as many characters as are necessary to match the rest of the expression.
However, you have a deeper problem that you really only want to stop when you encounter the balancing ')'. Determining whether a delimiter is balanced or not is something that is known to not be theoretically possible in pure regular expressions. MATLAB's "regular expressions" are, though, extensions to the standard regular expressions. MATLAB's expressions have much in common with Perl's "regular expressions", and it is possible in Perl to find the balancing delimiter. It has been a number of years since I looked at the relevant (tricky) Perl code; I think it is possible in the regular expressions that MATLAB provides, but I would not want to try to reinvent the technique -- too ugly and hard to debug.
The easiest thing to do might be to use MATLAB's perl() command to call a perl routine to do the parsing for you, having looked in the Perl FAQ to find the mechanism.
  1 Comment
Joseph
Joseph on 12 Jul 2011
Haha, that might work. Though I have no prior experience in perl. It is funny because I have been trying to develop this code to turn arrays of piecewise symbolic functions into matlab code, but to do it I have to create strings of function blocks from ccode(). ccode() is able to generate code from piecewise symbolic functions for example when defining specific heats or free energies over various temperature ranges when I need it for numeric problems such as constrained minimization. I have some code that works fine making the matlab functions, but is redundant because it retains all the redundant evaluations of the correct interval for each expression when they can all be grouped into one.

Sign in to comment.

More Answers (1)

Oleg Komarov
Oleg Komarov on 12 Jul 2011
tokens = regexp(s,'if\ ([\(\)\w\ \.><=&]+)\s+{','tokens')

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!