parser: add a LL(1) parser for syntax analysis #3

Daniel-Boll · 2022-10-29T18:11:51Z

LL(1)

The LL(1) parser will receive a LL(1) grammar expecting it to be already LL(1) compliant. The purpose of this top-down analysis is to create a AST (Abstract Syntax Tree) deriving left-to-right.

Structures

First-Follow

We have to establish what grammar rule the parser should choose if it sees a nonterminal $\Delta$ on the top of its stack and a symbol $\alpha$ on its input stream. Wiki

First set

The first set represents the possible characters found from that nonterminal $\Delta$, written as $Fi(w)$.

Given a grammar with the rules $A_1 \rightarrow w_1$, …, $A_n\rightarrow w_n$, we can compute the $Fi(w_i)$ and $Fi(A_i)$ for every rule as follows:

initialize every $Fi(A_i)$ with the empty set { $\varnothing$ }
add $Fi(w_i)$ to $Fi(A_i)$ for every rule $A_i \rightarrow w_i$, where Fi is defined as follows:
- $Fi(aw')$ = { $a$ } for every terminal $a$
- $Fi(Aw')$ = $Fi(A)$ for every nonterminal $A$ with ε not in $Fi(A)$
- $Fi(Aw' )$ = ( $Fi(A)$ \ { $\varepsilon$ } ) $\cup \ \ Fi(w' )$ for every nonterminal $A$ with $\varepsilon$ in $Fi(A)$
- $Fi(\varepsilon)$ = { $\varepsilon$ }
add $Fi(w_i)$ to $Fi(A_i)$ for every rule $A_i \rightarrow w_i$
do steps 2 and 3 until all $Fi$ sets stay the same.

Follow set

Unfortunately, the First-sets are not sufficient to compute the parsing table. This is because a right-hand side $w$ of a rule might ultimately be rewritten to the empty string. So the parser should also use the rule $A \rightarrow w$ if $\varepsilon$ is in $Fi(w)$ and it sees on the input stream a symbol that could follow $A$. Therefore, we also need the Follow-set of $A$, written as $Fo(A)$ here, which is defined as the set of terminals a such that there is a string of symbols $\alpha Aa\beta$ that can be derived from the start symbol. We use $ as a special terminal indicating end of input stream, and S as start symbol.

Computing the Follow-sets for the nonterminals in a grammar can be done as follows:

initialize $Fo(S)$ with { $ } and every other $Fo(A_i)$ with the empty set
if there is a rule of the form $A_j → wAiw'$ , then
- if the terminal $a$ is in $Fi(w')$, then add $a$ to $Fo(Ai)$
- if $\varepsilon$ is in $Fi(w')$, then add $Fo(A_j)$ to $Fo(A_i)$
- if $w'$ has length 0, then add $Fo(A_j)$ to $Fo(A_i)$
repeat step 2 until all Fo sets stay the same.

Parsing Table

It is a bidimensional matrix in the format $M[\Delta, \alpha]$ where $\Delta$ is a nonterminal and $\alpha$ is a terminal.

Now we can define exactly which rules will appear where in the parsing table. If $T[A, a]$ denotes the entry in the table for nonterminal $A$ and terminal $a$, then

$T[A,a]$ contains the rule $A → w$ if and only if
$a$ is in $Fi(w)$ or
$\varepsilon$ is in $Fi(w)$ and $a$ is in $Fo(A)$.

Equivalently: $T[A, a]$ contains the rule $A \rightarrow w$ for each $a \in Fi(w)\cdot Fo(A)$.

If the table contains at most one rule in every one of its cells, then the parser will always know which rule it has to use and can therefore parse strings without backtracking. It is in precisely this case that the grammar is called an LL(1) grammar.

Daniel-Boll self-assigned this Oct 29, 2022

Daniel-Boll added enhancement New feature or request test A unit or integration test parser A parser scope related feature labels Oct 29, 2022

Daniel-Boll added this to the v0.2.0 milestone Oct 29, 2022

Daniel-Boll removed this from the v0.2.0 milestone Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser: add a LL(1) parser for syntax analysis #3

parser: add a LL(1) parser for syntax analysis #3

Daniel-Boll commented Oct 29, 2022

parser: add a LL(1) parser for syntax analysis #3

parser: add a LL(1) parser for syntax analysis #3

Comments

Daniel-Boll commented Oct 29, 2022

LL(1)

Structures

First-Follow

First set

Follow set

Parsing Table