Skip to content
Snippets Groups Projects
README.md 4.25 KiB
Newer Older
pompolic's avatar
pompolic committed
Requirements

- Hammer and the target parser (pdf, for now) must be built with debug symbols for the tool to work
- The GUI component uses Tkinter to draw a window

pompolic's avatar
pompolic committed
Invocation

```
gdb -ex "source /path/to/utility-commands.py" -ex "hammer-parse-stop-at-pos 50" -ex "source /path/to/parser-type-instrumentation-gdb.py" -ex "source /path/to/parser-name-instrumentation-gdb.py" --args /path/to/pdf /path/to/input.pdf
```

To enable the GUI, in the gdb console:

```
(gdb) source /path/to/gui.py"
```

NOTE: If an error occurs, the GUI state will not get cleaned up properly. A "Parser Visualization" window will still be spawned, with widgets missing. Before invoking the GUI script again, make sure these windows are closed. This will clean up GUI state, and the script can be invoked again.

Tests

```
gdb -ex "source /path/to/utility-commands.py" -ex "hammer-parse-stop-at-pos 50" -ex "source /path/to/parser-type-instrumentation-gdb.py" -ex "source /path/to/parser-name-instrumentation-gdb.py" -ex "source /path/to/tests/unit/parser-envs-pdf.py" --args /path/to/pdf /path/to/input.pdf
```

pompolic's avatar
pompolic committed
# Commands
pompolic's avatar
pompolic committed

The tool is in an experimental stage. The interface is liable to change.

pompolic's avatar
pompolic committed
## Execution control
pompolic's avatar
pompolic committed

```
hammer-parse-stop-at-input-pos <number>
```

Stops execution once the parsing process reaches past position `<number>` in the input stream. Two caveats: since parsers can consume more than one byte, the argument given is a lower bound of the actual stop position. Additionally, if a parser consumes enough input to reach the requested position, but would later fail, execution is stopped when position `<number>` is reached.

```
pompolic's avatar
pompolic committed
hammer-parse-step [number]
pompolic's avatar
pompolic committed
```

Advance the parsing process by calling the next <number> parsers on the input (according to the declared H_RULEs). For example, given:

```
H_RULE(a, h_uint8());
H_RULE(b_2, h_uint16());
H_RULE(b, h_sequence(b_2, b_2, NULL));

H_RULE(a_b, h_sequence(a, b, NULL));
```

pompolic's avatar
pompolic committed
Stopping at `a_b` and invoking the command will stop at the applications of the following parsers:
pompolic's avatar
pompolic committed
a, b, b_2, b_2

Invoking `hammer-parse-step 2` would result in the following list:

b, b_2

pompolic's avatar
pompolic committed
This is not equivalent to advancing the input stream by [number] bytes, rather, it is equivalent to running until the next [number] pushes on the parser stack. See also `hammer-parser-backtrace`.
pompolic's avatar
pompolic committed

pompolic's avatar
pompolic committed
If the GDB parameter "hammer-extended-parse-step-info" is set to "on", it will also invoke hammer-parser-backtrace and hammer-parser-preview-input.

pompolic's avatar
pompolic committed
```
hammer-parse-continue
```

Alias of GDB `continue`. May change later.

pompolic's avatar
pompolic committed
## Querying
pompolic's avatar
pompolic committed

```
pompolic's avatar
pompolic committed
hammer-parser-backtrace [number]
pompolic's avatar
pompolic committed
```

Print the "call stack" for parsers. A call to `perform_lowlevel_parse` corresponds to a push to the stack, while a return from it corresponds to popping the stack. `<number>` controls the number of items to print. If the parameter is not given, the entire stack is printed.

```
hammer-parser-mem-use <address>
```

Print bytes allocated in the context of the parser located at `<address>`. The memory use is counted separately per arena, thus the result contains a dictionary keyed with adresses of arenas. The value belonging to the keys is the number of bytes allocated.
pompolic's avatar
pompolic committed

```
hammer-parser-mem-use-name <name>
```

Print bytes allocated in the contexts of parsers matching `<name>`. `<name>` is either the name given to a parser in the H_RULE declaration, or `(Unnamed <type>)`, for example: `(Unnamed sequence)`. If multiple parsers match the name given, stats for all matching parsers will be printed.

pompolic's avatar
pompolic committed
```
hammer-parser-preview-input
```

Interprets the next 32 bytes of input as a UTF-8, and prints the resulting string.

```
hammer-parser-top-per-arena-mem
```

Finds the parser with highest number of allocated bytes in a single arena.

```
hammer-parser-top-total-arena-mem
```

Sums up each parser's memory use across all arenas, and prints the parser with the highest total allocated bytes.

# Limitations

This tool is currently built and tested against the pdf parser. It makes a few assumptions:

- Presence of an `init_parser()` function that declares the parser's H_RULEs. This will later be parameterized to support other parsers built with Hammer.
- The parser using Hammer's Packrat backend
- The return instructions in `init_parser()`, `perform_lowlevel_parse()`, `h_packrat_parse()` will be rendered as "ret" or "retq" by GDB