Actually get a CMap out of a PDF with pdftour
I renamed `parsecmaps.py` to `pdftour.py`. Now it's possible to use it to navigate the structure of at least one PDF file well enough to parse a CMap out of it. This involved adding some stream support, which involved tweaking the parsing engine a bit. In keeping with the rest of the fast-and-loose-exploration nature of the program, it doesn't even check for `endstream` after the end of the stream, much less `endobj`. With that, and a bit of tweaking, my `cmaps_for_pages` code from last week runs now!
Loading
Please register or sign in to comment