Determining the next action in a \*LR(k)-style bottom-up parse is nontrivial. The parse state inherently includes some form of *item stack*. The parser will shift terminal items from the input stream and push them onto the stack, and execution of the reduction rules will create nonterminal items which will be pushed onto the stack as well. Execution of the reduction rules will also pop some terminal and/or nonterminal items from the stack.
Unfortunately, it is not sufficient to examine the item stack and compare it against right-hand-sides of reduce rules, execute whichever reduce rule matches, and shift otherwise.
The traditional method of next-action determination in an \*LR(k)-style parsers involves assigning a unique state number to each state in the pushdown automaton's state transition graph. These state numbers are pushed onto the automaton's stack along with the language items themselves. These state numbers let the automaton determine the relevant stack context (for shift/reduce/accept/error decisionmaking) without needing to examine *any* of the other stack state.
Unfortunately, for languages of any significant size, the tables the state numbers index get unwieldly, especially if trying to represent them in FPGAs.
Rather than trying to compress these opaque tables into LUTs (to avoid taking up too much block RAM) and writing a straightforward LR parser state machine, I investigated the possibility of representing the language's structure in the FPGA's logic.
Our first attempt involved making a parser without any state numbers. The LR state numbers are a serialization of the stack state. In an FPGA we can examine multiple stack items at once (if they're in flops / not in RAM). To examine stack state, we compare the top `k` items on the stack against a list of "stack state identifier" templates, and whichever one matches is deemed the current automaton state (which then determines the next-action determination).
For many context-free grammars, this works. I wrote nMigen HDL code to implement this, along with code to ingest LR tables generated by `bison` that calculates the "stack state identifier" for each LR automaton state and uses this to parametrize the HDL. Also, I wrote code to do random generation of sentences in the language, feeds the sentences into a simulated parser, and then compares the parser's output tree tree with the known-good derivation of the sentence.
Unfortunately, for more complex LR(1)-parseable deterministic CFGs, the approach of comparing the top `k` language items to a list of candidate stack states fails. A minimal case demonstrating this is in `language_subhierarchy/cfg_hard.y`. In fact, we don't even need the power of a CFG to break this approach, `language_subhierarchy/regular_language_hard.y` is an example of a *regular* language which has states in its LR pushdown automaton that cannot be identified by literal matching against a list of candidates.
We need a little bit more power in our comparison. Fortunately, the additional power we need is minimal: a regular language works.