Call content processing (text extraction) in the proper place
Currently parse_catalog(), which serves as the top-level entry point to content parsing, is called form parse_xrefs(). But the latter is just supposed to be a helper for main() that it needs to call before the proper parser can run.
NB: The reason for parse_xref() to exist at all is that we cannot parse arbitrary stream objects without access to their /Length field which is often an indirect reference. Our parser actually does not want to be an "island style" parser, what parse_xrefs() and friends (lookup, parse_obj, ...) implement, but we have to resort to finding the /Length objects at their random locations in the file because we cannot (generally) progress past a stream without them.
See also #33 for general cleanup of the content processing code itself.