diff --git a/TODO b/TODO
new file mode 100644
index 0000000000000000000000000000000000000000..eae25e8a2040dec85d1ccbea20d0597990b0970a
--- /dev/null
+++ b/TODO
@@ -0,0 +1,29 @@
+ - move main routine(s) into separate source file.
+ - move filter implementation(s) into separate source file.
+
+ - investigate memory use on big documents (millions of objects).
+
+ - replace disparate parsing routines (applied to different pieces of input)
+   with one big HParser that uses h_seek() to move around. this will enable
+   packrat to cache, for instance, the xref tables instead of us parsing them
+   once to resolve references and again as part of the linear parse.
+
+ - parse stream objects without reference to their /Length entry by simply
+   trying all possible ways and consistency-checking them against the xref
+   table in the end, via h_attr_bool().
+
+ - include position information, at least for objects, in the (JSON) output.
+ - format warnings/errors (stderr) as JSON, too.
+
+ - make custom token types for all appropriate parts of the parse result.
+
+ - parse content streams.
+
+ - implement random-access parser (walking objects from /Root).
+ - check linear and random-access parses for consistency.
+
+ - handle garbage before %PDF- and after %%EOF
+ - handle garbage at other points in the input?
+
+ - add ASCII filter types.
+ - add LZW filter.