LALR parser for "obj"
consider reviving the effort to get "obj" to parse with LALR. the messy grammar for arrays with "elemd", "elemr", etc. still stems from project, as does the explicit handling of whitespace -- note that TOK() is only used in KW() and that no instances of KW() remain under "obj".
Click to expand
alternatively, consider fully reverting the grammar to its clearer PEG form. i would probably keep the explicit whitespace, though.
what stopped me before was the difficulty to resolve some things without precedence rules; specifically line endings in string literals. is a "crlf" or a "cr" followed by an "lf"? LALR cannot decide unless you encode that anything following a "cr" doesn't start with . string literals are currently defined differently. the best way to do it, AFAICS, would be to match (in string literals) all subsequent line endings in one nonterminal and to encode there that a plain "cr" is never followed by "lf".
FWIW, the motivation for LALR parsing of "obj" was the prospect of parsing an object stream incrementally, as chunks come in from the decompressor (or an arbitrary filter chain).
NB: the reason why we must distinguish "crlf" from "cr" "lf" at all is of course that in a string literal, the former means "\n" and the latter means "\n\n".