I need some explanations on "regexpi" and "str2double" functions.

4 views (last 30 days)
Hi,
I am working on this script to draw a plot synced with a video for my research.
I am struggling with certain parts of the script and I uploaded the screenshot of MATLAB screen. The file name of a representative 'img_name' is 09m12.468585s.jpg
Can you please explain these points?
  1. '(\d)(\w*)m','split'
  2. str2double(s{2}(1:end-5))
  3. '\d*[m]','match'
  4. str2double(m{1}(1:end-1))
Although I read up on the two functions from the official MATLAB website, the examples suggested there are not enough to understand the codes.
Thank you very much.

Accepted Answer

Stephen23
Stephen23 on 25 Aug 2017
Edited: Stephen23 on 25 Aug 2017
1) This regexp matches any single digit followed by zero or more digits or letters, followed by the letter m. The 'split' option tells regexp to return the parts of the input that are not matched. For example:
>> regexp('XXX123mXXX','(\d)(\w*)m','split') % match '123m'
ans =
'XXX' 'XXX'
>> regexp('XXX1ABCmXXX','(\d)(\w*)m','split') % match '1ABCm'
ans =
'XXX' 'XXX'
>> spl = regexp('09m12.468585s.jpg','(\d)(\w*)m','split') % match '09m'
spl =
'' '12.468585s.jpg'
2) str2double converts strings representing numbers into numerics of type double. For example:
>> str2double('123') % this is what str2double does
ans =
123
>> spl{2}(1:end-5) % the output here is still a string!
ans =
12.468585
>> str2double(spl{2}(1:end-5)) % convert string to numeric
ans =
12.469
The author of your code used indexing to only convert up until the last five characters, in a rather unreliable attempt at removing the file extension and the character at the end of the filename.
3) This matches any string of digits followed by an m. The square brackets are totally unnecessary in this situation, and it is not clear why the author used them. The 'match' option tells regexp to return the parts of the string that match this regular expression.
>> regexp('XXX123mXXX','\d*[m]','match')
ans =
'123m'
>> mtc = regexp('09m12.468585s.jpg','\d*[m]','match')
mtc =
'09m'
4) They then used indexing again to get all characters excluding the last one, and convert this to double:
>> str2double(mtc{1}(1:end-1))
ans =
9
The author did not write very robust regular expressions:
  • the first regexp also matches letters and not just digits, which are then passed to str2double. Presumably the author only really wanted to match digits.
  • using different regular expressions for the split and match means that they could match different string content.
  • superfluous square brackets.
  • superfluous parentheses to group and form tokens, which are then never used.
  • using indexing to obtain parts of the regexp outputs defeats the whole point of using regular expressions. It would be better to get the regular expression to obtain exactly the parts of the string that are needed.
Do not learn from these rather fragile and badly thought-out regular expressions.
Better Regular Expression
>> tok = regexp('09m12.468585s.jpg','^(\d+)m(\d+(\.\d+)?)\w*\.\w+$','tokens','once')
tok =
'09' '12.468585'
>> str2double(tok)
ans =
9 12.469
Explanation of this regexp:
^ % the start of the string
(\d+) % any digits (in a token)
m % followed by an 'm'
(\d+(\.\d+)?) % any integer or decimal number (in a token)
\w* % any letters
\.\w+ % the file extension
$ % end of the string
Instead of using ugly and unnecessary indexing to remove parts of the returned strings as your examples show, I simply put the parts of the matched string that we actually want (the numbers) into tokens, and got regexp to return these tokens. As you can see this makes the code much simpler.
"the examples suggested there are not enough to understand the codes."
You should be reading the explanations in the documentation, not just looking at the examples. The MATLAB documentation explains everything I explained above (I know that, because I have read the regexp docs many times).
Bonus Interactive Regular Expression Tool
If you want to experiment with regular expressions then you might like to download my interactive tool:
It lets you play with any regular expression, and shows all of regexp's outputs as you type. For example:
>> iregexp('09m12.468585s.jpg','^(\d+)m(\d+(\.\d+)?)\w*\.\w+$','tokens','once')
  2 Comments
Cheeesepondue
Cheeesepondue on 25 Aug 2017
Thank you very much for the answer and a piece of advice, Stephen. I really appreciate it. There is still one thing I am not sure of. "s{2} and m{1}". I am wondering what each numeric value does inside the curly bracket.
Stephen23
Stephen23 on 25 Aug 2017
Edited: Stephen23 on 25 Aug 2017
@RyanHwang: those integers are indices into the cell arrays that are output from regexp, they are not part of the regular expression. To learn about indexing, which is a very basic MATLAB concept, you should do the introductory tutorials:
and read about cell arrays:

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!