How can I use regexp to return a list of variable names?

I want to extract variable names from a string. For my purposes, variables start with a letter or an underscore, and they end before anything except for open parenthesis ("(").
So
'sin(var1) + var2'
should become something like
["var1" "var2"]
I tried this:
testStr = 'sin(var1) + var2';
vars = regexp(testStr,'([a-zA-Z_]\w*)(?:[^(\w]|$)','tokens')
and got this:
ans =
0x0 empty cell array
What am I doing wrong?

3 Comments

For my purposes, variables start with a letter or an underscore, and they end before anything except for open parenthesis ("(")
'si' starts with a letter or underscore and ends before something that is not an open parenthesis, so it is not clear why you do not include 's' or 'si' or 'v' or 'va' or 'var' on your list?
'var1) + var2' starts wit ha letter or an underscore and is not followed by an open parenthesis, so it is not clear why that is not considered a variable?
I think your definition of variable forgot to discuss treatment of whitespace, and characters that are not underscores or "letters" (the meaning of which is not specified -- is è a letter ?) Why do you include the 2, since that is not considered a "letter" ?
You're right. I was not very clear about my definition of a variable. A variable:
  • begins with either an English letter or an underscore
  • contains any number of English letters, underscores, and digits
  • ends before something that is not an English letter, underscore, or digit
In the string:
'var1 * _var2 * 2var3 + func(var5))'
the variables are:
var1
_var2
var3
var5
notice that the 2 in front of var3 is not included, nor is "func."
I've tested the expression on regex101 and it matches all of the correct expressions, but when I call regexp with the arguments 'tokens', I don't get an array of the token text like I was expecting.
func meeets these three condidrions:
  • "begins with either an English letter or an underscore" yes!
  • "contains any number of English letters, underscores, and digits" yes!
  • "ends before something that is not an English letter, underscore, or digit" yes!
So why is func not on your list of variables when it meets all of your conditions?
Note that the regular expression you defined on regex101 actually uses "ends before something that is not an English letter, underscore, digit, or open bracket".

Sign in to comment.

Answers (2)

I would do it in two steps. First, remove the functions. i.e. characters before open parenthesis
testStr = 'sin(var1) + var2';
var_step_1 = regexprep(testStr,'[\w_]{0,}\(', '\(')
It gives "(var1) + var2". Then, match vars.
var_step_2 = regexp(var_step_1,'[\w_]{0,}', 'match')
I used regexpi for simplicity:
>> str = 'var1 * _var2 * 2var3 + func(var5))';
>> C = regexpi(str,'([A-Z_]\w*)(?![\(\w])','match');
>> C{:}
ans = var1
ans = _var2
ans = var3
ans = var5
If you want to develop regular expressions then you might be interested in downloading my simple Interactive Regular Expression tool:

Categories

Products

Asked:

on 22 Feb 2018

Edited:

on 23 Feb 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!