- move main routine(s) into separate source file. - move filter implementation(s) into separate source file. - investigate memory use on big documents (millions of objects). - replace disparate parsing routines (applied to different pieces of input) with one big HParser that uses h_seek() to move around. this will enable packrat to cache, for instance, the xref tables instead of us parsing them once to resolve references and again as part of the linear parse. - parse stream objects without reference to their /Length entry by simply trying all possible ways and consistency-checking them against the xref table in the end, via h_attr_bool(). - include position information, at least for objects, in the (JSON) output. - format warnings/errors (stderr) as JSON, too. - make custom token types for all appropriate parts of the parse result. - parse content streams. - implement random-access parser (walking objects from /Root). - check linear and random-access parses for consistency. - handle garbage before %PDF- and after %%EOF - handle garbage at other points in the input? - add ASCII filter types. - add LZW filter.