Object stream parser split at logical boundaries
fix the object stream parser to split input at logical boundaries, as provided by the object index ("N pairs of integers") at the beginning of the stream data.
Click to expand
this follows discussion with peter wyatt where he initially said that the objects should be delimited by normal PDF token rules, but PDFA then came to the conclusion that, in fact, this was a mistake and the logical begin/end info should delimit things. i.e. if your index says that an object begins at offset 0 and ends at offset 3, followed by one that ends at 6, and the input is "123456", this parses as two numbers, 123 and 456.
currently the code follows the incorrect former approach, (re-) using the "elemr" parser that is otherwise used with arrays. the above example would parse as one element, the number 123456, in contradiction to the index (which we parse but ignore).
we have to explicitly walk the index, run our "obj" parser on each respective snippet of input, and wrap the results up in a parse result. we should also validate conditions on the index beforehand. these are thankfully sane (monotonic offsets etc.) and mentioned in the spec