diff --git a/TODO b/TODO new file mode 100644 index 0000000000000000000000000000000000000000..eae25e8a2040dec85d1ccbea20d0597990b0970a --- /dev/null +++ b/TODO @@ -0,0 +1,29 @@ + - move main routine(s) into separate source file. + - move filter implementation(s) into separate source file. + + - investigate memory use on big documents (millions of objects). + + - replace disparate parsing routines (applied to different pieces of input) + with one big HParser that uses h_seek() to move around. this will enable + packrat to cache, for instance, the xref tables instead of us parsing them + once to resolve references and again as part of the linear parse. + + - parse stream objects without reference to their /Length entry by simply + trying all possible ways and consistency-checking them against the xref + table in the end, via h_attr_bool(). + + - include position information, at least for objects, in the (JSON) output. + - format warnings/errors (stderr) as JSON, too. + + - make custom token types for all appropriate parts of the parse result. + + - parse content streams. + + - implement random-access parser (walking objects from /Root). + - check linear and random-access parses for consistency. + + - handle garbage before %PDF- and after %%EOF + - handle garbage at other points in the input? + + - add ASCII filter types. + - add LZW filter.