Snippets Groups Projects

Call content processing (text extraction) in the proper place

Currently parse_catalog(), which serves as the top-level entry point to content parsing, is called form parse_xrefs(). But the latter is just supposed to be a helper for main() that it needs to call before the proper parser can run.

NB: The reason for parse_xref() to exist at all is that we cannot parse arbitrary stream objects without access to their /Length field which is often an indirect reference. Our parser actually does not want to be an "island style" parser, what parse_xrefs() and friends (lookup, parse_obj, ...) implement, but we have to resort to finding the /Length objects at their random locations in the file because we cannot (generally) progress past a stream without them.

See also #33 for general cleanup of the content processing code itself.

Edited 2 years ago

Designs

Child items ...

Activity

Sven M. Hallberg changed the description 2 years ago

changed the description
Sven M. Hallberg mentioned in commit 76e546ce 2 years ago

mentioned in commit 76e546ce
Sven M. Hallberg mentioned in merge request !46 (merged) 2 years ago

mentioned in merge request !46 (merged)
Sven M. Hallberg marked this issue as related to #33 2 years ago

marked this issue as related to #33
Sven M. Hallberg closed with commit 76e546ce 2 years ago

closed with commit 76e546ce

Please register or sign in to reply