Skip to content
Snippets Groups Projects

Requirements

  • Hammer and the target parser (pdf, for now) must be built with debug symbols for the tool to work
    • Specifically, both of them need to be compiled with the "-g" flag
    • Alternatively, if a symbol file is available, loading it with the symbol-file gdb command, or the -s command line switch before executing the scripts should also work
  • The GUI component uses Tkinter to draw a window

Invocation

(Replace /path/to with the appropriate paths.)

gdb -ex "source /path/to/profiling/perf-instrumentation/gdb-port/utility-commands.py" -ex "source /path/to/profiling/perf-instrumentation/gdb-port/commands.py" -ex "source /path/to/profiling/perf-instrumentation/gdb-port/hammer-breakpoints.py" -ex "source /path/to/profiling/perf-instrumentation/gdb-port/breakpoint-manager.py" -ex "source /path/to/gitlab-repos/profiling/perf-instrumentation/gdb-port/top-level-parse.py" -ex "hammer-parse-stop-at-pos 50" -ex "source /path/to/parser-type-instrumentation-gdb.py" -ex "source /path/to/parser-name-instrumentation-gdb.py" --args /path/to/pdf /path/to/input.pdf

Note that -ex "hammer-parse-stop-at-pos 50" is not strictly necessary, but by default the tool will print memory stats and exit.

To enable the GUI, in the gdb console:

(gdb) source /path/to/gui.py"

NOTE: If an error occurs, the GUI state will not get cleaned up properly. A "Parser Visualization" window will still be spawned, with widgets missing. Before invoking the GUI script again, make sure these windows are closed. This will clean up GUI state, and the script can be invoked again.

Tests

gdb -ex "source /path/to/utility-commands.py" -ex "hammer-parse-stop-at-pos 50" -ex "source /path/to/parser-type-instrumentation-gdb.py" -ex "source /path/to/parser-name-instrumentation-gdb.py" -ex "source /path/to/tests/unit/parser-envs-pdf.py" --args /path/to/pdf /path/to/input.pdf

Commands

The tool is in an experimental stage. The interface is liable to change.

Execution control

hammer-parse-stop-at-pos <number>

Stops execution once the parsing process reaches past position <number> in the input stream. Two caveats: since parsers can consume more than one byte, the argument given is a lower bound of the actual stop position. Additionally, if a parser consumes enough input to reach the requested position, but would later fail, execution is stopped when position <number> is reached.

hammer-parse-step [number]

Advance the parsing process by calling the next [number] parsers on the input (according to the declared H_RULEs). For example, given:

H_RULE(a, h_uint8());
H_RULE(b_2, h_uint16());
H_RULE(b, h_sequence(b_2, b_2, NULL));

H_RULE(a_b, h_sequence(a, b, NULL));

Stopping at a_b and invoking the command 4 times will stop at the applications of the following parsers:

a, b, b_2, b_2

Stopping at a_b and invoking hammer-parse-step 2 two times would result in the following list:

b, b_2

(Skipping over a and the first application of b_2.)

This is not equivalent to advancing the input stream by [number] bytes, rather, it is equivalent to running until the next [number] pushes on the parser stack. See also hammer-parser-backtrace.

If the GDB parameter "hammer-extended-parse-step-info" is set to "on", it will also invoke hammer-parser-backtrace and hammer-parser-preview-input.

hammer-parse-continue

Alias of GDB continue. May change later.

hammer-parse-apply

Applies the current parser and prints the HParseResult. More precisely:

Steps execution until the "current" parser (the one h_do_parse() has been called on at the time of command execution) returns its result. Thereafter, it steps to the next invocation of h_do_parse, and optionally prints the parser stack and input preview, as with hammer-parse-step.

In terms of call stack navigation, this is roughly analogous to executing finish to step out of the current stack frame, followed by stepping into the next function call. (With the difference, that this commands steps between "frames" of the parser stack)

hammer-parse-step-to-result <number>

Select a parser on the stack, step to when it returns its result, and print it.

<number> select a parser on the parser stack according to the following rules:

  • If zero, the the "current" parser (which h_do_parse is about to apply) is selected
  • If positive, the selected parser will be <number>th item from the top of the stack (with the "current" parser being 0)
  • If negative, parsers are counted from the bottom of the stack, with the bottom of the stack being -1. Note: positive is 0-indexed, negative is 1-indexed

This command will step until the selected h_do_parse frame returns its value. If applicable, it extracts the AST returned, and prints it. Afterwards, it will step to the next h_do_parse invocation. (This corresponds to stepping until the next time the parser stack grows in size, for example from a parser combinator applying its constituent parsers. For example: after selecting an element of a h_sequence() combinator for printing AST results, execution will stop when the next element of said sequence is about to be applied.)

Querying

hammer-parser-backtrace [number]

Print the "call stack" for parsers. A call to perform_lowlevel_parse corresponds to a push to the stack, while a return from it corresponds to popping the stack. [number] controls the number of items to print. If the parameter is not given, the entire stack is printed.

hammer-parser-mem-use <address>

Print bytes allocated in the context of the parser located at <address>. The memory use is counted separately per arena, thus the result contains a dictionary keyed with adresses of arenas. The value belonging to the keys is the number of bytes allocated.

hammer-parser-mem-use-name <name>

Print bytes allocated in the contexts of parsers matching <name>. <name> is either the name given to a parser in the H_RULE declaration, or (Unnamed <type>), for example: (Unnamed sequence). If multiple parsers match the name given, stats for all matching parsers will be printed.

hammer-parser-preview-input

Interprets the next 32 bytes of input as UTF-8, and prints the resulting string.

hammer-parser-top-single-arena-mem

Finds the parser with highest number of allocated bytes in a single arena.

hammer-parser-top-total-arena-mem

Sums up each parser's memory use across all arenas, and prints the parser with the highest total allocated bytes.

hammer-parser-average-mem

Prints the average number of bytes used separately for each HArena.

hammer-parser-dump-memory-stats

Prints memory usage statistics for all parsers encountered up to that point. (If a HParser is not explicitly named in an H_RULE in one of the parser initializing functions, and has not been applied on the input yet, it will not appear in the statistics. For example: given the H_RULE(foo, h_choice(h_uint8(), h_uint16(), NULL)), foo will appear in the statistics if it's been declared in init_parser(), but the unnamed h_uint8() will only appear if it's been applied at least once.)

Experimental

hammer-arena-dump-stats [address]

Given the address an HArena (in hexadecimal), it'll read out Hammer's statistics for that particular arena. If the library is compiled with DETAILED_ARENA_STATS allocation counts and byte counts are available as well. Defaults to the arena extracted from the HParseState of the currently ongoing parse. The command's name is liable to change in the future.

Limitations

This tool is currently built and tested against the pdf parser. It makes a few assumptions:

  • The target platform is x64
    • This is due to breakpoints on RET capturing return values by inspecting the RAX register, which is Intel specific
  • Presence of an init_parser() function that declares the parser's H_RULEs. This will later be parameterized to support other parsers built with Hammer.
  • The parser using Hammer's Packrat backend
  • The return instructions in init_parser(), perform_lowlevel_parse(), h_packrat_parse() will be rendered as "ret" or "retq" by GDB