DIKU - Walking through the parser
I’ve just finished writing a recursive descent parser, and I’m really quite happy with it.
You can view the implementation here.
The parser’s task is to take sequential tokens from the lexer, and determine if they fit the language grammar. If they do, it builds a data structure known as an abstract syntax tree (AST), which encodes the tokens, their relationships, and their precedence.
The parser’s run()
method iterates through the lines of an open filehandle, passed as an argument.
Reflecting on the grammar, we can see that there are only three valid symbols for the first token in any line - PRINT
, WHILE
or VARIABLE
- which we can test
for:
The function for each case
can then make assumptions about the tokens to follow.
Each logical block in the program file is returned from these functions as an AST.
You’ll note I have implemented the Node
class as abstract, and then extended the class several times to create, for example, a BinaryOperation
class.
This stops a Node
object from being instantiated and removes a lot of duplicate code from Node.php
, whilst making the parser functions and resulting tree structure a little easier to follow.
Taking the assignment()
function as an example:
Following the assignment
production, the above function attempts to build a valid AST.
It uses a helper function, consume()
, to check the symbol of the current token, throw a (hopefully) helpful error if it does not match the assumed string, or otherwise update the current token with the next token from the lexer.
In this example, if the VARIABLE
is not ASSIGN
-ed to a QUOTED_STRING
, it calls the function exp_1
. In a similar way, some of the grammar productions allow recursion, hence the name of this style of parser.
Adding these functions became easier as I became familiar with how they fit in with the grammar, with the exception of the statement()
function, which can span several lines of input - thus leading to an edge case where the lexer would return a null
if it encountered the end of a line.
All said and done, I’m quite happy. On to the next chapter!