Skip to content
Snippets Groups Projects
  • Kragen Javier Sitaker's avatar
    Make PDF trailer parsing lazy · e39478bb
    Kragen Javier Sitaker authored
    This facilitates exploring PDF files that I can't actually parse yet;
    I can still use the Pdf object to look at parts of the file.  For example:
    
        >>> d = pdf.read('../Descargas/dercuano.20191230.pdf')
        >>> d.trailer
        Traceback (most recent call last):
        ...
          File "/home/compu/izodparse/izodparse/pdf.py", line 202, in <lambda>
    	parenstring.xform = lambda d: ('str', bytes(d[0][1])) # XXX croaks on anything with \()
        TypeError: 'tuple' object cannot be interpreted as an integer
        >>> d.get_indirect_obj(440)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/home/compu/izodparse/izodparse/pdf.py", line 290, in get_indirect_obj
    	offset, plumb = result
        TypeError: cannot unpack non-iterable NoneType object
        >>> d.xrefs
        <izodparse.pdf.XrefSection object at 0x7feaef606a30>
        >>> d.xrefs[440]
        b'0000095363 00000 n\r\n'
        >>> d.xrefs.offset_of(440)
        95363
        >>> d.read(_)
        b'440 0 obj\r\n<< /Border [ 0 0 .1 ] /C [ .6 .6 1 ] /Contents (notes'
    
    In this case there are two separate problems: I need to fix
    paren-string parsing for the trailer, and I need to be able to read
    fractions to read the .6.
    e39478bb
    History