README

## Summary

This code is an implementation of an LR(1) parser written in nMigen, a python-based hardware description language. This means that it is possible to use the synthesized output as an IP core/block in an FPGA design. 

## Parametrization Interface

At compile/synthesis time, you will provide the software with parametrization information that includes:

* The complete language specification

* The bitwidth of the input data

* The bitwidth and size of the serialized parse-tree output memory, or alternatively, the bitwidth of the serialized parse-tree output stream

* The maximum stack depth

Once the parser is fully parametrized, the software uses nMigen for synthesis of the generated gateware. nMigen generates an intermediate language at the register-transfer level (RTLIL) which is readable by yosys (an advanced open-source FPGA synthesis tool). From this RTLIL, yosys can be used to either synthesize a complete design for any of the FPGAs it supports (Xilinx 7-Series, Lattice ECP5, Lattice ice40), or yosys can be commanded to generate synthesizable Verilog, which can be used in any FPGA synthesis tool that supports Verilog.

Simulation can be done:

* with the generated Verilog; using a Verilog simulator of your choice, such as Verilator.

* at the RTLIL level with yosys's cxxrtl; a backend that converts the RTLIL to C++ (permitting easy integration of C/C++ models of other components of the design or with a C/C++ testbench) classes, which then can be compiled and executed to simulate the design.

* at the nMigen level. nMigen has an integrated simulator, which is less performant than either Verilator or cxxrtl, but allows for easy integration with the nMigen code.


## HDL/Gateware Interface


The synthesized design expresses two data interfaces, the input and output, and several sideband signals. The input data port has a "ready" output, and a "valid" input, which allow for flow control when interfacing with upstream components, and the output data port has a "valid" output and a "ready" input, which allow for flow control when interfacing with downstream components. Currently, the output data port is connected to a RAM, for ease in simulation/testing, but this is arbitrary and will be removed.

The sideband signals do not carry the bulk of the data, but are critical to proper operation of the parser and correct use of its results. The sideband signals are:

* A reset input (not actually implemented yet), which permits to reset the parser's entire state, in preparation to receive a new input stream

* A "parse complete" output, which indicates that the parser has made its final decision on the validity of the current input stream

* A "parse success" output (valid only when the "parse complete" output is 1), which indicates whether the parser has determined that the input stream was a member of the language it was instantiated with -- or if there was a parse error

* An "internal fault" output, which indicates that there has been an internal fault within the parser. This can happen from inconsistency or imminent overflow in the parser's data structures

* A "last index written to serialization memory" output, where the parser outputs the total length of the serialized parse tree. This allows for gateware (or in this case, a simulation testbench) that reads the serialized parse tree output to know how many words to read in the parse tree RAM.


## Design

I devised a perhaps-novel (I have not seen descriptions of this method anywhere in my literature searches) method of implementing shift-reduce parsers in parallel hardware. 

The insight is as follows: at heart, the LR parser's state numbers are not a required nor necessary feature of the algorithm: they only encode the state of the top n items on the stack. They are only necessary if the platform the algorithm is implemented on can *only* examine the top item on the stack. In that case, the algorithm will need to serialize the relevant stack state. However, parallel hardware, such as an FPGA, can examine all relevant stack items at once, and therefore does not need to explicitly serialize the stack state into a number (and index large unwieldy tables with that number).

With this result in hand, I proceeded to examine methods to do the parallel language-rule-matching that is necessary for such an implementation, and designed two different methods -- one which is simply a large simultaneous combinatorial evaluation of the language rules, and another method which involves pipelining language rule lookups, which is perhaps more conducive to higher clock rates. I decided to implement the first, since it is far simpler. However, I do have plans for using the parallel fixed-string/regular-language matching results for implementing language-rule matching in a pipelined fashion, which should provide more efficient utilization of the FPGA, as well as significantly faster clock speeds.

I have implemented this, along with a top-level state machine -- which ingests data, sequences all the operations, feeds the parallel-access stack, stalls the ingestion process if there's a matching rule, and applies the appropriate language reduction rule for the rule that matched.

There is gateware that generates a parse tree as the parser executes. Furthermore, the way that the parse tree generator is integrated into the parser itself allows this parse tree to be generated without having to read back into the written parse tree -- the parse tree is serialized on-the-fly and can be streamed out of the relevant port on the FPGA design without any memory required.

However, code that uses this parser *must* be careful to not use the parse tree for any operations until the parser reports a correct parse. This is inevitable and is simply the result of a parser that is capable of streaming a parse tree without needing a final processing step after the parse has been confirmed to succeed.


## Installation instructions and dependencies

A recent version of Python 3 is required, as is nMigen and NetworkX, which can be installed with:

pip install --user --upgrade git+https://github.com/nmigen/nmigen.git
pip install --user --upgrade networkx
pip install --user --upgrade bitarray

Currently, there is no parametrization interface because the LR table generation code has not yet been written, but the existing automated test suite can be run with:

$ python automatic_tester.py

## Testing/verification

There are several different types of invariants that apply to the parser at runtime. In approximate order of difficulty of verification, they are:

* Correctness of parse results results. This includes:
    * Always accepting strings in the language, and always generating a correct parse tree
    * Always rejecting strings not in the language

* Input/output/sideband interface properties: This includes:
    * Accept input only when the "valid" line is asserted
    * Advance output only after the "ready" line is asserted
    * Only assert the output's "valid" line when the output is indeed valid
    * Resetting all the state when the "reset" line is asserted

* Internal invariants/behavior:
    * Memory write access pattern to serialized parse tree RAM
    * State machines only transition when they're supposed to

Since the latter two types of invariants are commonly proved with standard formal verification methods for FPGA/ASIC designs, I have decided to first direct my efforts toward the first category since it is the one that will require the most novel work from me. As such, I have written code that ingests a language description and then generates random strings from that language of arbitrary length, along with the corresponding derivation tree for each string. Furthermore, I have interfaced this generation code with the simulated LR parser, written code to deserialize the parser-generated parse tree, and I compare the original derivation tree with the generated parse tree. I have run this code for multiple hours on a fast machine (Xeon E-2288G) with no observed bugs. 

There remains to write a testing scheme that mutates the generated strings in order to generate strings that will be rejected by the parser. This will necessitate integration with a pre-existing parser, in order to eliminate exemplars which have accidentally become valid strings through mutation, but most of the necessary work (random generation of language items, interface with the simulated parser gateware, etc) is already done and tested.


Once the LR(1) table generation code is working, I will be able to test *random languages* (and not just random strings from a specific language). This is because a language (such as BNF) that can describe any context-free grammar is *itself* a context-free grammar. This means that "random generation of CFG language" is in fact a special case of "random generation of item within a given CFG" (with the CFG being BNF or something equivalent). Naturally, not every random CFG will be LR(1)-parseable or even deterministic, but the judgement of my LR(1) generator code can be confirmed against that of a known-good LR(1) generator. Once a random LR(1)-parseable language has been generated and the parser parametrized with it, the above procedures (testing valid and invalid strings) can be completed similarly.

The remaining invariants shall be verified with standard tools like SymbiYosys which use BMC/k-induction-type techniques to establish certain properties about the design (or generate a trace showing a violation of the invariant).


## Work left to do

* Implement the LR(1) table generation code, which will ingest a grammar description and generate the tables that are necessary for parametrizing the parser. Due to the novel design which eschews the use of "state numbers" which is common in most implementation of LR-type parsers, a standard LR table generation software cannot be used unmodified.

* Design and implement tests for the remaining formal properties/invariants

* Implement and test "gearboxes", which will allow for conversion of signals with different bit-widths to and from the parser gateware. This is important because the length required to describe a symbol in the grammar might not be the same length as the output parse tree memory or the input data stream

* Implement a parametrization interface for both the language specification (list of terminals, all nonterminals, nonterminals that we should include in the parse tree, language rules) and ancillary parameters like word width and width of input/output buses; which can be interfaced with a tool such as Hammer.

* Clean up the code around the top-level state machine (especially as relates to state transitions and input/output flow control)

* Prototype a pipelined rule matcher.