Contents
Growing a Compiler, Chapter 3
format compact % Help MATLAB save screen space
3 GEM Versions 2 and 3
Treatment of whitespace and character classes.
3.1 Built-in scan
It is not too hard to make the second extension of iog, that automatically skips whitespace during interpretation. The trick is to skip blanks and newlines every place they can show up. There are just a few such places in each of the PARSE, SEARCH and BACK modes.
The new extension has its former capabilities and in addition no need for removing whitespace ahead of time. The new version can be seen to work by using the pretty version of nowhite as grammar input.
G = gem2(); % Instantiate the object (built-in de-whiting). GEM = G.run; fprintf('''%s''', GEM('x Y z', G.nowhite))
'xYz'
If you are interested in the new files iog2.h, iog2.c and gem2.m, you can download this File Exchange submission and look at them in the MATLAB editor.
3.2 Built-in character classes
The grammar executing machine gem0 treats all letters the same: letters name rules to be found in the IOG. In gem3 the eight character class letters are each processed separately. The call to G.run has a flag which can contain any or all of TdluaDLUA. If, for instance, a is in the flag string, gem3 will act as though class asciiCFG is appended to the second argument. The consequence is that the actual IOGs are much shorter and the time to execute is much faster. For example, consider an IOG that recognizes strings of digits:
G = gem3(); % Instantiate the object (built-in char classes) GEM = G.run; fprintf('res=''%s''\n', GEM('182746', 'g=dg;g=d;', 'd'));
res=''
3.3 Size comparisons
Size comparisons for the versions so far
!wc -l iog0.c iog0.h
136 iog0.c 76 iog0.h 212 total
!wc -l iog1.c iog1.h
160 iog1.c 108 iog1.h 268 total
!wc -l iog2.c iog2.h
163 iog2.c 118 iog2.h 281 total
!wc -l iog3.c iog3.h
230 iog3.c 204 iog3.h 434 total
If you are interested in the new files iog3.h, iog3.c and gem3.m, you can download this File Exchange submission and look at them in the MATLAB editor.
3.4 Multi-character input and output symbols
The limitation of character input and output symbols to single characters was mentioned as a restriction that would be removed by bootstrapping. This is accomplished by an IOG nomulti. A special case is needed in nomulti for the characters ' and "; neither can be part of a multi-character symbol they delimit. Otherwise, multicharacter symbols are broken into a sequence of single-character symbols. nomulti applied to two trivial grammars
fprintf('123=%s\n', GEM('', GEM('r="123";', G.nomulti, 'A'))); fprintf('null=%s\n', GEM('123', GEM('r=''123'';', G.nomulti, 'A')));
123=123 null=
3.5 A new pretty
Having allowed whitespace in IOGs, pretty printing also has to accept IOGs with whitespace but is still obligated to print just enough whitespace so as to make the IOG readable. Here the new pretty is used to display the IOG for dealing with multiple character input and output symbols. The trick is in the recursively defined names i and o.
nm = GEM(G.nomulti, G.pretty, 'ULA'); % using the new pretty fprintf('--nomulti IOG--\n%s', nm);
--nomulti IOG-- g = P; p = I I I; p = O O O; p = ''' i; p = '"' o; p = A; i = '''; i = "'" A "'" i; o = '"'; o = """ A """ o; I = ''' "'"; O = '"' """; P = p P; P =;
3.6 left-associative expressions
Right-associative expressions were defined as part of demonstrating the inverter. Using a technique similar to that found using in functional languages, left-associate (normal) expressions can be described as well.
postfix = GEM(G.postfix, G.pretty, 'ULA'); fprintf('postfix IOG:\n%s', postfix);
postfix IOG: g = e; e = t R; r = '+' t "+"; r = '-' t "-"; t = f S; s = '*' f "*"; s = '/' f "/"; f = L; f = D; f = '(' e ')'; R = r R; R =; S = s S; S =;
The following code makes postfix from an expression
fprintf('postfix expr = %s', GEM('1+c*d+3', G.postfix, 'LD'));
postfix expr = 1cd*+3+
Just to complete the example, the following code makes prefix from an expression
fprintf('prefix expr = %s', GEM('1+c*d+3', G.prefix, 'LD'));
prefix expr = +1+*cd3
3.7 X86 float assembly language
The Intel X86 floating point unit (FPS) is a stack, thus is accessed with stack instructions such as fld (push onto FPS), fadd (add top elements on the stack). The postfix in the previous example only needs to be reformatted to give an assembly-code like equivalent.
x86expr = GEM(G.x86, G.pretty, 'ULA'); fprintf('x86 IOG:\n%s', x86expr);
x86 IOG: g = e; e = t R; r = '+' t "fadd "; r = '-' t "fsub "; t = f S; s = '*' f "fmul "; s = '/' f "fdiv "; f = "fld " L " "; f = "fld =" D " "; f = '(' e ')'; R = r R; R =; S = s S; S =;
As you can see by looking at the IOG above, it was convenient to use the multi-character form of the IOG. nomulti has to be used on x86 to make it executable by gem3(). Applied to the example above, the result is:
asm1 = GEM(G.x86, G.nomulti, 'A'); res1 = GEM('1+c*d+3', asm1, 'LD'); fprintf('x86 assembly code\n%s', res1);
x86 assembly code fld =1 fld c fld d fmul fadd fld =3 fadd
The story continues with the next chapter.