Skip to content
Snippets Groups Projects
  1. Apr 14, 2023
  2. Apr 13, 2023
  3. Mar 30, 2023
    • Sven M. Hallberg's avatar
      remove stale comment · 656f5a3f
      Sven M. Hallberg authored
      Finished reviewing past modifications to parse_xrefs().
      
      NB: All code attributed to Sumit Ray has been removed from this function.
      656f5a3f
    • Sven M. Hallberg's avatar
      improve handling of parse errors in xref stream data · a1014f81
      Sven M. Hallberg authored
      Improve on the bugfix in commit a5abf1e2:
      
      - Reinstate the assert for 'res->ast != NULL'. If it fails, there is a bug
        in the parser, not an error in the input file.
      - Provide a distinct error message for the case where p_xref fails on a
        cross-reference stream because of invalid data.
      - Only skip storing the invalid section. Try to follow the /Prev entry in
        the stream dictionary to find more sections.
      a1014f81
    • Sven M. Hallberg's avatar
      remove a comment · 512de3c2
      Sven M. Hallberg authored
      I cannot tell what this refers to. The (nonexistent) else case of the
      if statement above it is simply the case of the object number in question
      not falling within this subsection.
      
      Anyway, the function lookup_xref() is a low-level utility used during
      parsing, not a place to produce error messages.
      512de3c2
    • Sven M. Hallberg's avatar
      comments regarding act_ks_value · c8be9e84
      Sven M. Hallberg authored
      HParseResult was introduced in 6b54ebfa (generally parse stream objects)
      to hold the result of parsing the stream data, including the application
      of any filters. This is produced in act_ks_value(). The fact that parse
      errors in stream data are thus detectable is in fact significant for
      xref stream processing, so we should not just return the bare data on
      error.
      c8be9e84
    • Sven M. Hallberg's avatar
      adjust comments · e6196639
      Sven M. Hallberg authored
      e6196639
    • Sven M. Hallberg's avatar
      don't emulate VIOL in error messages · b3dda3fe
      Sven M. Hallberg authored
      While it might seem like a good idea to "grade" errors by severity,
      we are not *really* in any place to do so accurately. Our tasks are
      
       (a) to decide, internally, whether to print a message or silently
           ignore a malformation, and
       (b) to ultimately judge the file valid or invalid as a whole.
      
      Note that the latter part, as stated before, is not the responsibility
      of parse_xrefs().
      
      Reinstate the input file name in these error messages. That information
      is useful when running the program on multiple files from a script, as
      we have been doing.
      
      While we're at it, fix style (line lengths).
      b3dda3fe
    • Sven M. Hallberg's avatar
      add test cases for out-of-bounds xref pointers · 9ff8c465
      Sven M. Hallberg authored
      Both currently fail because the parser proper does not validate
      these offsets.
      9ff8c465
    • Sven M. Hallberg's avatar
      drop use of h_seek in parse_xrefs · 9196b5c2
      Sven M. Hallberg authored
      Now that we are validating the offset ourselves, we no longer need
      h_seek() to do our bounds checking. But add a defensive assert just
      in case.
      9196b5c2
    • Sven M. Hallberg's avatar
      bounds-check /Prev pointers · dd3c8e62
      Sven M. Hallberg authored
      Mirrors the check for startxref. I considered unifying the two into one
      test at the start of the loop, but then we would lose the information
      whether we got the offset from startxref or a /Prev.
      dd3c8e62
    • Sven M. Hallberg's avatar
      report location of invalid startxref · aa405607
      Sven M. Hallberg authored
      This is useful information, especially in hex, when looking into the file.
      The invalid value itself, on the other hand, is not so useful.
      aa405607
    • Sven M. Hallberg's avatar
      adjust error message · 550c070d
      Sven M. Hallberg authored
      The correct and standard format specifier for values of type size_t is %zu.
      There is no need to point out the valid bounds.
      Match style with the other messages.
      550c070d
    • Sven M. Hallberg's avatar
      remove useless/erroneous condition · 431c7db3
      Sven M. Hallberg authored
      The offset can never be negative (size_t is unsigned).
      And this treated offset = 0 as out of bounds, which is nonsense.
      In fact, offset == size is also not invalid (it is the end of file).
      431c7db3
  4. Mar 28, 2023
    • Sven M. Hallberg's avatar
      revert parse_xrefs to its original signature · 9883a543
      Sven M. Hallberg authored
      Passing the aux struct by reference may look cleaner, but it was
      deliberate to keep parse_xrefs() independent of that struct, since the
      latter is conceptually part of the parser's interface and the former is
      not.
      
      Also, this way parse_xrefs() has a proper return value that signals
      success or failure.
      
      Plus, no ugly indirection or temporary variable is needed to access sz.
      9883a543
    • Sven M. Hallberg's avatar
      move parse_xrefs back next to main · 81dc4dba
      Sven M. Hallberg authored
      Move parse_xrefs() back in its proper place as a helper to main(),
      including the definition of the global variable 'infile' with the rest
      of the command line arguments.
      
      It had been moved in fbbe953f when the content processing code was
      confusedly hooked into the function.
      
      Also removes marker comments about "Start/End xref parsing". The code
      between them is not exclusively concerned with xrefs and their sheer
      size clashes with the rest of the coding style.
      81dc4dba
    • Sven M. Hallberg's avatar
      restore single exit point in parse_xrefs · 91473d5f
      Sven M. Hallberg authored
      It turns out that this function was in fact meant to always assign a
      result (NULL/0 on failure), accomplished by having a single exit point.
      This was changed in 517b81ad for no reason. Reverting.
      
      I'm guessing the goto was considered disagreeable, so I'll explain the
      rationale. The function accumulates its result in the *local* variables
      xrefs and n. This mainly makes the code nicer to read than writing to
      the output directly. Having a single exit point, a property that is easy
      to verify, ensures that no update to the local variables can get lost,
      i.e. they serve as de-facto aliases for the outputs.
      91473d5f
    • Sven M. Hallberg's avatar
      allow partial results from parse_xrefs · f844fce3
      Sven M. Hallberg authored
      This was intended to be a 'break' statement, returning any xref sections
      parsed up to that point. Note that parse_xrefs() is *not* supposed to be
      the validating parser proper for the document. It is a utility that is
      needed before the actual parser can run, so if it returns partial data
      in a best effort, that is fine.
      f844fce3
    • Sven M. Hallberg's avatar
      remove erroneous comment · bc05e4c6
      Sven M. Hallberg authored
      As far as I can tell, this is not a case for SEV_DONTCARE. It's
      (conceptually) a parse error.
      bc05e4c6
    • Sven M. Hallberg's avatar
      comments · cbf3a1bb
      Sven M. Hallberg authored
      cbf3a1bb
    • Sven M. Hallberg's avatar
      expand a comment · ecb984e9
      Sven M. Hallberg authored
      ecb984e9
    • Sven M. Hallberg's avatar
      revert to shadowing a variable instead of overwriting it · 3de6efd6
      Sven M. Hallberg authored
      As indicated by the XXX, this line of code was never meant to be
      considered as "proper", mainly because it allocates a new parser
      that is never freed for every xref section in the file.
      
      The reason that this line was introduced in place of the commented-out
      original above it is that the latter does not bounds-check the offset
      but h_seek() does. It was a quick and reliable way to make invalid
      offsets fail the parse.
      
      Anyway, creating a new HParser *p that incidentally shadows the
      previous occurance was meant to signal that this is just a temporary
      name that we need real quick (on the next line), not a "proper" variable,
      and that it probably wasn't there to stay. So with that sense in mind, I
      am putting it back.
      
      Remove -Wshadow from CFLAGS.
      3de6efd6
    • Sven M. Hallberg's avatar
      remove superfluous assignment · 83bdf6cd
      Sven M. Hallberg authored
      How did that get in there? The very next line overwrites it.
      83bdf6cd
    • Sven M. Hallberg's avatar
      parse_xrefs: remove unconditional initialization of result · b606013c
      Sven M. Hallberg authored
      The purpose of parse_xrefs() is not to initialize the aux environment,
      that is done in main. The design of the function is to fill a particular
      part of that environment *if* it succeeds and not touch it otherwise.
      Thus the late write to 'aux' at the end of the function.
      b606013c
  5. Mar 14, 2023
  6. Mar 01, 2023
    • Sven M. Hallberg's avatar
      bring the two loops in act_txtobj into alignment · 222781c8
      Sven M. Hallberg authored
      The function act_txtobj() performs two passes over the sequence of
      text operators. The first pass tracks position and length of the
      string. Then an array is allocated of the indicated length and a
      second pass fills it with the actual characters.
      
      The two passes must be close carbon copies of each other or a mismatch
      between the predetermined length and the actual number of characters
      produced might occur and cause the assertion "txtlen == idx" to fail.
      
      This code structure is obviously bad, a text book example of why code
      duplication should be avoided. Nevertheless, before rewriting it
      entirely, this patch at least corrects an immediate bug.
      
      Fixes #44.
      222781c8
    • Sven M. Hallberg's avatar
      get rid of ts variable in act_txtobj · fbbf2c68
      Sven M. Hallberg authored
      The node member of TextState_T seems to be written only in one place in
      parse_pagenode(), and in such a way that the assertion "ts->node == node"
      must always hold. So we might as well assign node where it said ts->node
      and get rid of ts completely.
      fbbf2c68
    • Sven M. Hallberg's avatar
      access text state consistently through node · 76cb6955
      Sven M. Hallberg authored
      This one line going through another variable seems to spurious.
      Note that ts is always equal to &node->ts.
      76cb6955
    • Sven M. Hallberg's avatar
      df03908f
    • Sven M. Hallberg's avatar
      avoid an assert in parse_fonts · dee6c415
      Sven M. Hallberg authored
      This is an assertion of the type that catches an error (in user-supplied
      data) that should be handled, namely the case where the /Font entry
      of a dictionary is expected to be itself a dictionary but isn't.
      The code already contains a path for the case where the /Font entry
      is missing (return false) and I suppose the same, including the
      TODO item "figure out how to handle", might as well apply instead
      of the assertion.
      
      Fixes #45.
      dee6c415
  7. Feb 28, 2023
  8. Feb 27, 2023
Loading