Skip to content
Snippets Groups Projects
Commit e39478bb authored by Kragen Javier Sitaker's avatar Kragen Javier Sitaker
Browse files

Make PDF trailer parsing lazy

This facilitates exploring PDF files that I can't actually parse yet;
I can still use the Pdf object to look at parts of the file.  For example:

    >>> d = pdf.read('../Descargas/dercuano.20191230.pdf')
    >>> d.trailer
    Traceback (most recent call last):
    ...
      File "/home/compu/izodparse/izodparse/pdf.py", line 202, in <lambda>
	parenstring.xform = lambda d: ('str', bytes(d[0][1])) # XXX croaks on anything with \()
    TypeError: 'tuple' object cannot be interpreted as an integer
    >>> d.get_indirect_obj(440)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/compu/izodparse/izodparse/pdf.py", line 290, in get_indirect_obj
	offset, plumb = result
    TypeError: cannot unpack non-iterable NoneType object
    >>> d.xrefs
    <izodparse.pdf.XrefSection object at 0x7feaef606a30>
    >>> d.xrefs[440]
    b'0000095363 00000 n\r\n'
    >>> d.xrefs.offset_of(440)
    95363
    >>> d.read(_)
    b'440 0 obj\r\n<< /Border [ 0 0 .1 ] /C [ .6 .6 1 ] /Contents (notes'

In this case there are two separate problems: I need to fix
paren-string parsing for the trailer, and I need to be able to read
fractions to read the .6.
parent 3f38b4dd
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment