Implementing ellipsis, also known as dot dot dot or "..." for line continuation in a regular expression statement

So this is driving me nuts. Matlab documentation says "dot dot dot" or ellipsis is treated like a space, but obviously not and it's driving me crazy. I'm sure it's something so easy to figure-out for an experienced Matlab programmer, which clearly I'm not. I appreciate your help on this matter.
parts = regexp(filtered, '(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ...
(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)', 'names')
I've tried ending single quotes on first part and then wrapping second part expression with it's own quotes. I've tried placing the comma on second part. Combinations of comma inside quotes. Matlab says ellipsis is treated like a space so the above should technically work. Well it doesn't. I need help. Thank you for your time and on this piece of matlab code.

1 Comment

"Matlab documentation says "dot dot dot" or ellipsis is treated like a space..."
For character vectors the MATLAB documentation actually states "Build a long character vector by concatenating shorter vectors together... The start and end quotation marks for a character vector must appear on the same line" and procedes to give examples.
Your code does not follow what the MATLAB documentation specifies.

Sign in to comment.

 Accepted Answer

Instead of trying to split a long char vector across multiple lines, why not write your regular expression as a series of string arrays that you concatenate across multiple lines with +? That way each section is self-contained, you can't forget the ] at the end of a potentially long series of lines because you don't need one.
filtered = "The quick brown fox jumped over the lazy dog"; % Random text
regexpPattern = "(?<TNT>d+\.(\d)+), " + ...
"(?<T>\w*), " + ...
"(?<refTm>\d+), " + ... % Looking for trademark symbols?
"(?<P>\w+), " + ...
"(?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), " + ...
"(?<rbncntrl>[^w]\w+), " + ...
"(?<cntrlStatus>\d+, " + ...
"(?<satsTrk>\d+), " + ...
"(?<lastRbUpdt>\d+)" % Leaving off the semicolon so you can check the assembly
regexpPattern = "(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), (?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)"
parts = regexp(filtered, regexpPattern, 'names')
parts = 0×0 empty struct array with fields: TNT T refTm P tmSpyRefmns rbncntrl satsTrk lastRbUpdt
This has an added benefit that you can add a comment after the ellipsis to explain what each part of your regular expression means (like I did on the line with refTM.) This will help someone else reading your code (or you reading your code six months from now) to understand its purpose.

3 Comments

Here I was thinking putting it all together in one statement. This makes total sense making the code more readable and compartmentalized. I like it. The square brackets and single quotes for the expression, inside the regexp function can work too, but this method is so much better. Thanks Steven Lord!
I accepted prematurely. I'm using Matlab R2010a version. Double quotes were implemented much later, 4-5 years ago? Either way solution didn't work for me. I've tried using single quotes, removing the "+" between elements of the expression. I finally ended using single quotes and removing the "+" per each line. See the following code. For some reason it still doesn't work.
The returned output is empty structure arrays. The regexpPattern gets created correctly, but once it goes into parts variable line of code, it doesn't like it. Any ideas?
regexpPattern = ['(?<TNT>d+\.(\d)+), ' ...
'(?<T>\w*), ' ...
'(?<refTm>\d+), ' ...
'(?<P>\w+), ' ...
'(?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ' ...
'(?<rbncntrl>[^w]\w+), ' ...
'(?<cntrlStatus>\d+, ' ...
'(?<satsTrk>\d+), ' ...
'(?<lastRbUpdt>\d+)']
parts = regep(filtered, regexpPattern,'names')
[m,n] = size(parts)
for j=1:m
for k=1:n
MJD =str2double({parts{j,k}.MJD});
format long
disp(MJD)
end
end
Disregard the above. A cross contamination of original "regexpPattern". Problem has been fixed. The enclosure of the expression parts within square brackets ([ ]) was the glue that eventually made it work. I like this structure of code and how I feel it should be written. Credit given. Thanks Steven Lord!

Sign in to comment.

More Answers (2)

When you use ellipses inside a character array, you have to end it on that line, start it again on the next line, and concatenate the different parts. In this case, that might look like this (check that the pattern in regexp is accurate):
parts = regexp(filtered, ['(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+), ' ... not sure if the space belongs inside the pattern or not
'(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)'], 'names')
I've tried the following too. No dice. Matlab doesn't like it. Reason I'm wanting to use dot dot dot is to provide a wrap-around effect to read the code better.
parts = regexp(filtered, '(?<TNT>d+\.(\d)+), (?<T>\w*), (?<refTm>\d+), (?<P>\w+), (?<tmSpyRefns>[^\w]\w+,(?<tmSpyRefmns>[^\w]\w+),'...
'(?<rbncntrl>[^w]\w+), (?<cntrlStatus>\d+, (?<satsTrk>\d+), (?<lastRbUpdt>\d+)', 'names')

3 Comments

Note the square brackets [ ] in my answer are missing in yours.
"I've tried the following too. No dice. Matlab doesn't like it."
Because you built two separate character vectors, without joining them together like the MATLAB documentation shows:
As Voss stated, you are missing the square brackets.
Thanks Stephen. But I still don't know how to resolve it.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!