# Simple MATLAB lexer program

5 views (last 30 days)
Aditya Abeysinghe on 24 Dec 2015
Answered: Jos on 24 Dec 2015
I created a simple lexer program from MATLAB code where, when the user types a string, the lexemes in that string are categorized. However, when I enter a string in the command window only the first lexeme is getting displayed.
The code is as follows :
function determineLexemes()
prompt = 'Enter string : ';
str = input(prompt);
arr = char(str);
j = 0;
m = char(zeros(1,30));
error = '';
strTwo = '';
display('Symbol Table');
display('Lexeme \t\t Token');
k = size(arr);
for i = 1: k
if(arr(i) == '+')
end
if(arr(i) == '-')
display('- \t\t SUB_OP');
end
if(arr(i) == '*')
display('* \t\t MULT_OP');
end
if(arr(i) == '/')
display('/ \t\t DIV_OP');
end
if(arr(i) == '(')
display('( \t\t LEFT_PAREN ');
end
if(arr(i) == ')')
display(') \t\t RIGHT_PAREN ');
end
if(arr(i) == '=')
display('= \t\t EQUAL_OP ');
end
if(ischar(arr(i)) || isnumeric(arr(i)))
strTwo = strTwo + arr(i);
end
if(~ischar(arr(i)) && ~isnumeric(arr(i)))
if(~isspace(arr(i)) && ~isempty(strTwo))
m(j) = strTwo;
if(isNumeric(strTwo(1) && regexp('123abc', '^[A-Za-z0-9]+\$')))
disp(strcat('Error. Potential variable (', strTwo, ') whose name starts with digit found'));
strTwo = '';
j = j + 1;
end
if(~(isNumeric(strTwo(1) && regexp('123abc', '^[A-Za-z0-9]+\$'))))
disp(strcat(m(j), '\t\t' + 'IDENTIFIER'));
strTwo = '';
j = j + 1;
end
end
end
end
end
And the intended output, when '(2a + b)' is entered to the user prompt,is as follows:
However the current output, when '(2a + b)' is entered to the user prompt, is as follows :
Any help on this problem is appreciated.

Jos on 24 Dec 2015
The function needs some work but couple of things that should help
- use fprintf instead of display
- replace k=size(arr) with k=length(arr)
- arr(i) is always a char so 'if(ischar(arr(i)) isnumeric(arr(i)))' will always be true
- for the same reason 'if(~ischar(arr(i)) && ~isnumeric(arr(i)))' is never true and everything below the statement cannot be reached
- the expression 'regexp('123abc', '^[A-Za-z0-9]+\$')' does not contain any variables and is always true