MATLAB Answers

1

How to convert a numeric string into a numeric range?

Asked by Samuel Clary on 19 Sep 2017
Latest activity Edited by Stephen Cobeldick on 22 Jan 2019
I am working with a GUI which allows users to select custom groups of numbers. The inputs are always stored as strings; however, I need to convert the string to a range of numbers.
For example, if the user inputs...
[1:3,5,7:9]
Then I would like to have a stored value of...
[1, 2, 3, 5, 7, 8, 9]
Is there a way to do this without using eval()?
eval('[1:3,5,7:9]')
I know the use of eval() is frowned upon, but I cannot think of a more efficient method. My only other idea has been to use regexp() which takes much more time because of all the conditional aspects of the search.
Note: I know this can be done in the Command Window, but I am attempting to only use GUI functions or other similar functions that create a pop-up, such as:
inputdlg()

  1 Comment

"I cannot think of a more efficient method."
"I am attempting to only use GUI functions or other similar functions that create a pop-up, such as inputdlg()"
??? If you are using a GUI, then timing is surely not a critical factor? When the user takes 3 seconds to enter some values, why does 3 milliseconds make any difference?
A comment that applies to the question and all of the answers:
The comparisons being made are really comparing apples with oranges. In particular the conversions shown in the answers all handle a very specific string format, and will not parse anything else. This is perfectly reasonable, and is no doubt required to provide a secure GUI that can detect invalid inputs (which is also important for user feedback). In contrast eval simply evaluates anything, with no checking of syntax validity, security, etc. This means that the conversions shown actually offer more functionality than eval for this application, and so it is reasonable to expect them to take more time than eval.
To wrap eval with some input checking that can provide the same functionality as the conversions below would likely end up using significantly more time: it would require basically writing an entire string parser, to ensure that the input string is valid MATLAB code and will not throw an error or cause some problems, and on top of that check that the string fits the requirements of this application. Note that this could never be made secure, as any arbitrary function name could be defined and used in the string, as Jan Simon has pointed out.
Clearly the most efficient way of evaluating any input string, regardless of whether it is valid code (or might cause an error), insecure, malicious, or correct, is to use eval. But this is not the same functionality as a string parser that identifies valid strings and converts them into a vector, without risk of arbitrary code or bugs.
End-users will always find ways to break your system, whether intentional or accidental. I would suggest that in this case you should not be offering them an open door into the workings of your GUI!

Sign in to comment.

4 Answers

Answer by Stephen Cobeldick on 20 Sep 2017
Edited by Stephen Cobeldick on 22 Jan 2019
 Accepted Answer

function out = str2vec(str)
vec = sscanf(str(2:end),'%f%c');
out = [];
idb = 1;
ide = 1;
while idb<=numel(vec)
ide = idb+2*(vec(idb+1)==58); % 58==':'
out = [out,vec(idb):vec(ide)];
idb = ide+2;
end
end
It allows any decimal or integer numbers (including optional +/- sign and E-notation), separated by either one colon or one comma. For each number leading space characters are ignored, whereas trailing spaces cause an incorrect output. Tested for 1e4 iterations with the input string '[28:33,5,7:9,2e1]':
Elapsed time is 1.661 seconds. % this function
Elapsed time is 0.742 seconds. % eval
Outputs:
28 29 30 31 32 33 5 7 8 9 20 % this function
28 29 30 31 32 33 5 7 8 9 20 % eval
It could easily be adapted to allow for the optional step of the colon command.

  3 Comments

Hi Stephen, in my computer, your function is actually faster than eval (or just as fast)!
Str = '[33,37,-1:-4,1:30]'
tic
for j = 1:10000
Range = str2vec(Str);
end
toc %Elapsed time is 0.138589 seconds.
tic
for j = 1:10000
Range = eval(Str);
end
toc %Elapsed time is 0.144505 seconds.
It is better to time with timeit() than with tic/toc
This function runs faster on my computer than eval() as well. I have been testing it with several different inputs and it has been working very well. On larger ranges it can slow down, but I will throw in some warnings for the user if they try.
I am very interested in your function though. I have not had an opportunity to really look through it. (I have never used the vec() function.) Although, I am very interested in figuring out why it works so quickly.

Sign in to comment.


Answer by OCDER
on 19 Sep 2017
Edited by OCDER
on 19 Sep 2017

In case the user inputs out-of-order range, duplicate numbers, or negative numbers, this solution works too and is ~3x faster. But you may need more error handling features - can't predict all the types of inputs.
Str = '1:3,-9:-4,7:9'; %User inputs a weird range. No brackets needed
StrParts = cellfun(@(x) regexp(x, ':', 'split'), regexp(Str, '\-*\d+:\-*\d+|\-*\d+', 'match'), 'UniformOutput', false);
NumParts = cellfun(@(x) str2double(x(1)):str2double(x(end)), StrParts, 'UniformOutput', false);
Range = unique(cat(2, NumParts{:}));
Range =
-9 -8 -7 -6 -5 -4 1 2 3 7 8 9

  3 Comments

My problem with splitting the string using regexp() is that it is actually longer to get the result this way. Now, to be fair, we are talking ~1.5 msec for regexp(), but this is still roughly 26x slower than eval() at ~0.06 msec. My goal is to truly optimize with regards to speed. I really do appreciate your suggestion, though! That is the first time I have understood cellfun(). The examples I have seen up until this point never really clicked.
It is unlikely that you would be able to improve on eval() speeds, as eval() runs at compiled speeds whereas anything you do at the MATLAB level is at interpreted speeds.
To get something more robust but at compiled speeds you would need to move into a mex routine.
Yeah, I can't find something faster than eval. I do have a faster solution that is only 2.6 times slower than eval based on 10000 iterations. See below. Otherwise, Walter's solution to use MEX or Jan's solution to use an eval with safety check would be faster.
Str = '[1:3,-9:-4,7:9]';
%Newer answer
tic
for k = 1:10000
StrParts = regexp(Str, '\-*\d+\:*', 'match');
j = 1;
while j <= length(StrParts)
if StrParts{j}(end) == ':'
StrParts{j} = str2double(StrParts{j}(1:end-1)):str2double(StrParts{j+1});
StrParts{j+1} = [];
j = j + 2;
else
StrParts{j} = str2double(StrParts{j});
j = j + 1;
end
end
Range = unique(cat(2, StrParts{:}));
end
toc %Elapsed time is 0.926773 seconds.
%Previous answer
tic
for k = 1:10000
StrParts = cellfun(@(x) regexp(x, ':', 'split'), regexp(Str, '\-*\d+:\-*\d+|\-*\d+', 'match'), 'UniformOutput', false);
NumParts = cellfun(@(x) str2double(x(1)):str2double(x(end)), StrParts, 'UniformOutput', false);
Range = unique(cat(2, NumParts{:}));
end
toc %Elapsed time is 3.294332 seconds.
%Eval answer
tic
for k = 1:10000
Range = unique(eval(Str));
end
toc %Elapsed time is 0.352939 seconds.

Sign in to comment.


Answer by Walter Roberson
on 19 Sep 2017

rng = @(a,b) strjoin(cellstr(num2str((str2double(a):str2double(b)).')),',');
S = '[1:3,5,7:13]'
result = str2double( regexp( regexprep(S, {'\[', ']', '(\d+):(\d+)'}, {'', '', '${rng($1,$2)}'}), '\s*,\s*', 'split') );
Note: this code assumes that entries are separated by comma (which might have spaces around them) not by spaces alone.

  1 Comment

I have run this code when I tested Donald Lee's (commenter above) code. Unfortunately, this code is slower than eval() as well and my goal is to optimize for speed. I attempted to do something similar to this on my own, but I did not use regexprep(). I will keep that in mind for my future codes. Thank you very much for the input!

Sign in to comment.


Answer by Jan
on 19 Sep 2017

As long as eval processes numbers and the colon only, and does not create a variable dynamically, it is not evil. You could think of a security check:
Str = '[1:3,5,7:9]';
if ~all(ismember(Str, '0123456789+-.,:'))
error('Cannot process string securely');
end
v = eval(Str);
But as soon as expressions like "1e6" are considered, the problems begin: A user could type "eeee" and define a corresponding function. Then you need some regular expressions to examine the string to recognize valid numbers in scientific notation. But if this is implemented, using the output or regexp will be easier than eval-ing.
See the other two answers for constructive suggestions.

  5 Comments

In the end, it comes down to: can you trust the user? If the user can be trusted not to enter anything nefarious (such as rmdir('c:\', 's')) then using eval would be acceptable. If not, then you need to validate the string which likely involves regexp at which point there's no point in the eval anymore.
Note that if the program is useful, the potential user may change from trusted to untrusted as the program gets more widely used. By which time, it will have been forgotten that the parsing method did not validate its input and all hell may break loose. Therefore, coding defensively to start with would be safer.
@Guillaume: You can avoid the need to trust the user by setting Matlab to a defined state before calling eval:
Str = input('Input what ever you want:', 's')
system('format C:')
exit;
eval(Str); % ;-)
You can never trust a user, see Cody: Cheating is challenging for many users. Shadowing the functions, which determine if the result is correct, was the beginning only. What about activating sendmail of the underlying Linux sandbox? Or perhaps you can copy a dump of the complete virtual machine including the license to your dropbox?
I could never understand, why MathWorks offers such a powerful "eval it for me" service without a certified user identification. It is an invitation for illegal activities.
I've worked in a lab, in which the computers were completely boarded up by the IT staff: Windows without task manager and user access to any control panel, command window and power shell. After an IT admin fixed a computer, he left the volume of the internal speakers at 100%, such that each beep blowed away my brain. It would have taken 2 days until he had time for us again. Therefore I've started a compiled Matlab application, and used an eval'ed edit field to start:
system('rundll32.exe shell32.dll,Control_RunDLL mmsys.cpl,,0 &');
to access the sound control panel. Disabling a warning sound is not an illegal activity and of course I've informed the admins about what I was doing.
The sscanf library of old Matlab versions contained a bug, which allowed to gain admin privileges. The Java engine shipped with Matlab is susceptible also and (except for Macs) it is not updated. See also https://www.mathworks.com/matlabcentral/answers/58642-security-implications-by-java.
This means:
  1. The Matlab prompt can be misused to access proprietary data, to send spam or start DDoS attacks.
  2. Compiled GUIs which eval user input are equivalent to a Matlab prompt.
"No, my software will not be used for evil things" is the typical wrong estimation, which allowed to convert millions of IoT light bulbs and internet routers into an attacking bot network.
My advice considering security implications: Never use eval for user input.
I understand why the use of eval() is typically frowned upon now. I never realized just how powerful of a function this was. I will be sure to place restrictions on eval() if I use it in the future. Thanks for this advice and clarification.

Sign in to comment.