LALR or GLR compile abort()s with "lalr.c:335: h_lalr_compile: Assertion `!h_stringmap_empty(fs)' failed." when an h_sequence contains an h_choice with an empty set of alternatives
I may or may not dig further into this problem; here's what I have at the moment. I've been doing some basic generative testing of Hammer and found another crash in the LALR backend; I think I finally have it down to a bletcherously non-minimal test case:
build/debug/generative-test$ LD_LIBRARY_PATH=. python
Python 2.7.6 (default, Nov 12 2018, 20:00:40)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> from test import HCh, HChoice, HSequence
>>> grammar = HSequence(elements=[HCh(ch='\x8e'),
... HChoice(elements=[HCh(ch='\x8d'), HCh(ch='\x1a'),
... HChoice(elements=[]), HCh(ch='}'), HSequence(elements=[HCh(ch='\x04'),
... HCh(ch='\x9b'), HCh(ch='\x00'), HChoice(elements=[]), HCh(ch='#')]),
... HCh(ch='\x00')]), HCh(ch='6'), HCh(ch='\r'), HCh(ch='\xfc'),
... HChoice(elements=[HSequence(elements=[HChoice(elements=[])]),
... HCh(ch='\xb1'), HCh(ch='\xd6'), HCh(ch='\xec')]),
... HSequence(elements=[]),
... HSequence(elements=[HChoice(elements=[HCh(ch='\xd8'),
... HSequence(elements=[HCh(ch='N'), HCh(ch='m')]),
... HSequence(elements=[HSequence(elements=[HCh(ch='\x00'),
... HChoice(elements=[]), HSequence(elements=[])])]), HCh(ch='\xf8'),
... HSequence(elements=[]), HChoice(elements=[]), HCh(ch='\x00')]),
... HChoice(elements=[HCh(ch='\xca'), HSequence(elements=[])]),
... HChoice(elements=[HCh(ch='P'), HSequence(elements=[])]),
... HChoice(elements=[HCh(ch='\xa3'),
... HSequence(elements=[HChoice(elements=[]), HCh(ch='k'),
... HCh(ch='\x00')]), HSequence(elements=[])]),
... HChoice(elements=[HChoice(elements=[]),
... HSequence(elements=[HChoice(elements=[HSequence(elements=[HCh(ch='&')])]),
... HSequence(elements=[])])]), HSequence(elements=[HCh(ch='\xa2')]),
... HCh(ch='<'), HCh(ch=')'), HChoice(elements=[]), HCh(ch='\x00'),
... HSequence(elements=[HSequence(elements=[]), HCh(ch='V')]),
... HCh(ch='\xbc')])])
...
>>> grammar
HSequence(elements=[HCh(ch='\x8e'), HChoice(elements=[HCh(ch='\x8d'), HCh(ch='\x1a'), HChoice(elements=[]), HCh(ch='}'), HSequence(elements=[HCh(ch='\x04'), HCh(ch='\x9b'), HCh(ch='\x00'), HChoice(elements=[]), HCh(ch='#')]), HCh(ch='\x00')]), HCh(ch='6'), HCh(ch='\r'), HCh(ch='\xfc'), HChoice(elements=[HSequence(elements=[HChoice(elements=[])]), HCh(ch='\xb1'), HCh(ch='\xd6'), HCh(ch='\xec')]), HSequence(elements=[]), HSequence(elements=[HChoice(elements=[HCh(ch='\xd8'), HSequence(elements=[HCh(ch='N'), HCh(ch='m')]), HSequence(elements=[HSequence(elements=[HCh(ch='\x00'), HChoice(elements=[]), HSequence(elements=[])])]), HCh(ch='\xf8'), HSequence(elements=[]), HChoice(elements=[]), HCh(ch='\x00')]), HChoice(elements=[HCh(ch='\xca'), HSequence(elements=[])]), HChoice(elements=[HCh(ch='P'), HSequence(elements=[])]), HChoice(elements=[HCh(ch='\xa3'), HSequence(elements=[HChoice(elements=[]), HCh(ch='k'), HCh(ch='\x00')]), HSequence(elements=[])]), HChoice(elements=[HChoice(elements=[]), HSequence(elements=[HChoice(elements=[HSequence(elements=[HCh(ch='&')])]), HSequence(elements=[])])]), HSequence(elements=[HCh(ch='\xa2')]), HCh(ch='<'), HCh(ch=')'), HChoice(elements=[]), HCh(ch='\x00'), HSequence(elements=[HSequence(elements=[]), HCh(ch='V')]), HCh(ch='\xbc')])])
>>> import hammer
>>> grammar.to_swig().compile(hammer._PB_LALR)
python: build/debug/src/backends/lalr.c:335: h_lalr_compile: Assertion `!h_stringmap_empty(fs)' failed.
Aborted (core dumped)
build/debug/generative-test$
What I conceive to be the relevant part of test.py (checked in on a branch in this commit, 8580e019) to reproduce this is the following:
import attr # attrs is a dependency of Hypothesis
import hammer as h
@attr.s
class HCh(object):
ch = attr.ib()
def to_swig(self):
return h.ch(self.ch)
@attr.s
class HChoice(object):
elements = attr.ib()
def to_swig(self):
return h.choice(*(element.to_swig() for element in self.elements))
@attr.s
class HSequence(object):
elements = attr.ib()
def to_swig(self):
return h.sequence(*(element.to_swig() for element in self.elements))
This was a hack to print out the random grammars generated by Hypothesis in a readable form so that I could reproduce the bug outside of Hypothesis. Because abort()ing the Python process prevents Hypothesis's shrinking (test-case minimization) from working, the grammar in question is probably a lot more complicated than is needed to reproduce the bug.
I may see if I can use fork() or something to get Hypothesis to shrink the test case for me; if anyone has any suggestions on better ways to do that, I'm interested. Alternatively I could try minimizing it myself by hand and/or converting it to C to see if I can reproduce the same problem at the C API. Or I might just exclude the LALR backend from the generative testing for now as known-broken and deprioritized and extend the test coverage in other directions, knowing that this issue will always be here if someone ever decides to fix the problem.