arti/doc/testing/Profiling.md

153 lines
4.1 KiB
Markdown

# Arti profiling methodology
This document describes basic tools for profiling Arti's CPU and memory
usage. Not all of these tools will make sense for every situation, and
we may want to switch them in the future. The main reason for recording
them here is so that we don't have to re-learn how to use them the next
time we need to do a big round of profiling tests.
## Building for profiling
When you're testing with `cargo build --release`, use
`CARGO_PROFILE_RELEASE_DEBUG=true` to include extra debugging
information for better output.
## Profiling tools
Here I'll talk about a few tools for measuring CPU usage, memory usage,
and the like. For now, I'll assume you're on a reasonably modern Linux
environment: if you aren't, you'll have to do some stuff differently.
I'll talk about particular scenarios to profile in the next major
section.
### cargo flamegraph
[cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) is a
pretty quick-and-easy event profiling visualization tool. It produces
nice SVG flamegraphs in a variety of pretty colors. As with all
flamegraphs, these are better for visualization than detailed
drill-down. On Linux, `cargo-flamegraph` uses
[`perf`](https://perf.wiki.kernel.org/index.php/Main_Page) under the
hood.
To install, make sure you have a working version of `perf`
installed. Then run `cargo install flamegraph`.
Basic usage:
```
flamegraph {command}
```
Output: `flamegraph.svg`
Also consider using the `--reverse` flag, to reverse the stack and see the
lowest-level functions that get the most use.
### tcmalloc and pprof
This can generate usage graphs showing who allocated your memory when.
(It can get a bit confusing in Rust.)
```
HEAPPROFILE=/tmp/heap.hprof \
LD_PRELOAD=/usr/lib64/libtcmalloc_and_profiler.so \
{command}
```
```
pprof --pdf --inuse_space {binary} /tmp/heap.hprof > heap.pdf
```
You might need a longer timeout with this one; it's nontrivial.
### valgrind --massif
This tool can also generate usage graphs like pprof above.
`valgrind --tool=massif {command}`
It will generate a file called `massif.out.PID`. You can view it with the
`ms_print` tool (included with valgrind) or the `massif-visualizer` tool
(installed separately, highly recommended.)
## Some commands to profile
These should generally run against a chutney network whenever possible;
the `ARTI_CONF` envvar should be set to
e.g. `$(pwd)/chutney/net/nodes/arti.toml`.
### Bootstrapping a directory
`arti-testing bootstrap -c ${ARTI_CONF}`
(This test bootstraps only. It might make sense to do this one on the
real network, since its data is more complex. You need to start with an
empty set of state files for this to test bootstrapping instead of
loading.)
### Large number of circuits, focusing on circuit construction
Bootstrap outside of benchmarking, then run:
`arti-bench -u 1 -d 1 -s 100 -C 20 -p 1 -c ${ARTI_CONF}`
(100 samples, 20 circuits per sample, 1 stream per circuit, only 1 byte
to upload or download.)
Note that this test won't necessarily tell you so much about _path
construction_, since path construction on a large real network with
different weights, policies, and families is more complex than on a
chutney network.
(just times out with chutney; directory changes too fast, I think.)
### Running offline
Also
* Bootstrapping failure conditional
* Going offline
* Primary guards go down after bootstrap
(See `HowToBreak.md`)
### Data transfer
`arti-bench -s 20 -C 1 -p 1 {...}`
(No parallelism, 10 MB up and down.)
### Data transfer with many circuits
`arti-bench -s 1 -C 64 -p 1 -c ${ARTI_CONF}`
(Circuit parallelism only, 10 mb up and down)
### Data transfer with many streams
`arti-bench -s 1 -C 1 -p 64 -c ${ARTI_CONF}`
(Stream parallelism only, 10 mb up and down)
### Huge number of simultaneous connection attempts
`arti-bench -s 1 -C 16 -p 16 -c ${ARTI_CONF}`
(stream and circuit parallelism)
# TODO
arti-bench:
- take a target address as a string.
- Allow -p 0 to build a circuit only?
- Some way to build a path only?
Extract chutney boilerplate.
arti-testing:
- ability to make connections aggressively simultaneous