Newer
Older
Requirements
- Hammer and the target parser (pdf, for now) must be built with debug symbols for the tool to work
- The GUI component uses Tkinter to draw a window
Invocation
```
gdb -ex "source /path/to/utility-commands.py" -ex "hammer-parse-stop-at-pos 50" -ex "source /path/to/parser-type-instrumentation-gdb.py" -ex "source /path/to/parser-name-instrumentation-gdb.py" --args /path/to/pdf /path/to/input.pdf
```
To enable the GUI, in the gdb console:
```
(gdb) source /path/to/gui.py"
```
NOTE: If an error occurs, the GUI state will not get cleaned up properly. A "Parser Visualization" window will still be spawned, with widgets missing. Before invoking the GUI script again, make sure these windows are closed. This will clean up GUI state, and the script can be invoked again.
Tests
```
gdb -ex "source /path/to/utility-commands.py" -ex "hammer-parse-stop-at-pos 50" -ex "source /path/to/parser-type-instrumentation-gdb.py" -ex "source /path/to/parser-name-instrumentation-gdb.py" -ex "source /path/to/tests/unit/parser-envs-pdf.py" --args /path/to/pdf /path/to/input.pdf
```
The tool is in an experimental stage. The interface is liable to change.
```
hammer-parse-stop-at-input-pos <number>
```
Stops execution once the parsing process reaches past position `<number>` in the input stream. Two caveats: since parsers can consume more than one byte, the argument given is a lower bound of the actual stop position. Additionally, if a parser consumes enough input to reach the requested position, but would later fail, execution is stopped when position `<number>` is reached.
```
```
Advance the parsing process by calling the next <number> parsers on the input (according to the declared H_RULEs). For example, given:
```
H_RULE(a, h_uint8());
H_RULE(b_2, h_uint16());
H_RULE(b, h_sequence(b_2, b_2, NULL));
H_RULE(a_b, h_sequence(a, b, NULL));
```
Stopping at `a_b` and invoking the command will stop at the applications of the following parsers:
a, b, b_2, b_2
Invoking `hammer-parse-step 2` would result in the following list:
b, b_2
This is not equivalent to advancing the input stream by [number] bytes, rather, it is equivalent to running until the next [number] pushes on the parser stack. See also `hammer-parser-backtrace`.
If the GDB parameter "hammer-extended-parse-step-info" is set to "on", it will also invoke hammer-parser-backtrace and hammer-parser-preview-input.
```
hammer-parse-continue
```
Alias of GDB `continue`. May change later.
```
Print the "call stack" for parsers. A call to `perform_lowlevel_parse` corresponds to a push to the stack, while a return from it corresponds to popping the stack. `<number>` controls the number of items to print. If the parameter is not given, the entire stack is printed.
```
hammer-parser-mem-use <address>
```
Print bytes allocated in the context of the parser located at `<address>`. The memory use is counted separately per arena, thus the result contains a dictionary keyed with adresses of arenas. The value belonging to the keys is the number of bytes allocated.
```
hammer-parser-mem-use-name <name>
```
Print bytes allocated in the contexts of parsers matching `<name>`. `<name>` is either the name given to a parser in the H_RULE declaration, or `(Unnamed <type>)`, for example: `(Unnamed sequence)`. If multiple parsers match the name given, stats for all matching parsers will be printed.
```
hammer-parser-preview-input
```
Interprets the next 32 bytes of input as a UTF-8, and prints the resulting string.
```
hammer-parser-top-per-arena-mem
```
Finds the parser with highest number of allocated bytes in a single arena.
```
hammer-parser-top-total-arena-mem
```
Sums up each parser's memory use across all arenas, and prints the parser with the highest total allocated bytes.
# Limitations
This tool is currently built and tested against the pdf parser. It makes a few assumptions:
- Presence of an `init_parser()` function that declares the parser's H_RULEs. This will later be parameterized to support other parsers built with Hammer.
- The parser using Hammer's Packrat backend
- The return instructions in `init_parser()`, `perform_lowlevel_parse()`, `h_packrat_parse()` will be rendered as "ret" or "retq" by GDB