parser: add a LL(1) parser for syntax analysis #3
Labels
enhancement
New feature or request
parser
A parser scope related feature
test
A unit or integration test
LL(1)
The LL(1) parser will receive a LL(1) grammar expecting it to be already LL(1) compliant. The purpose of this top-down analysis is to create a AST (Abstract Syntax Tree) deriving left-to-right.
Structures
First-Follow
We have to establish what grammar rule the parser should choose if it sees a nonterminal$\Delta$ on the top of its stack and a symbol $\alpha$ on its input stream. Wiki
First set
The first set represents the possible characters found from that nonterminal$\Delta$ , written as $Fi(w)$ .
Given a grammar with the rules$A_1 \rightarrow w_1$ , …, $A_n\rightarrow w_n$ , we can compute the $Fi(w_i)$ and $Fi(A_i)$ for every rule as follows:
Follow set
Unfortunately, the First-sets are not sufficient to compute the parsing table. This is because a right-hand side$w$ of a rule might ultimately be rewritten to the empty string. So the parser should also use the rule $A \rightarrow w$ if $\varepsilon$ is in $Fi(w)$ and it sees on the input stream a symbol that could follow $A$ . Therefore, we also need the Follow-set of $A$ , written as $Fo(A)$ here, which is defined as the set of terminals a such that there is a string of symbols $\alpha Aa\beta$ that can be derived from the start symbol. We use $ as a special terminal indicating end of input stream, and S as start symbol.
Computing the Follow-sets for the nonterminals in a grammar can be done as follows:
Parsing Table
Now we can define exactly which rules will appear where in the parsing table. If$T[A, a]$ denotes the entry in the table for nonterminal $A$ and terminal $a$ , then
Equivalently:$T[A, a]$ contains the rule $A \rightarrow w$ for each $a \in Fi(w)\cdot Fo(A)$ .
If the table contains at most one rule in every one of its cells, then the parser will always know which rule it has to use and can therefore parse strings without backtracking. It is in precisely this case that the grammar is called an LL(1) grammar.
The text was updated successfully, but these errors were encountered: