153 lines
4.1 KiB
Markdown
153 lines
4.1 KiB
Markdown
# Arti profiling methodology
|
|
|
|
This document describes basic tools for profiling Arti's CPU and memory
|
|
usage. Not all of these tools will make sense for every situation, and
|
|
we may want to switch them in the future. The main reason for recording
|
|
them here is so that we don't have to re-learn how to use them the next
|
|
time we need to do a big round of profiling tests.
|
|
|
|
## Building for profiling
|
|
|
|
When you're testing with `cargo build --release`, use
|
|
`CARGO_PROFILE_RELEASE_DEBUG=true` to include extra debugging
|
|
information for better output.
|
|
|
|
## Profiling tools
|
|
|
|
Here I'll talk about a few tools for measuring CPU usage, memory usage,
|
|
and the like. For now, I'll assume you're on a reasonably modern Linux
|
|
environment: if you aren't, you'll have to do some stuff differently.
|
|
|
|
I'll talk about particular scenarios to profile in the next major
|
|
section.
|
|
|
|
### cargo flamegraph
|
|
|
|
[cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) is a
|
|
pretty quick-and-easy event profiling visualization tool. It produces
|
|
nice SVG flamegraphs in a variety of pretty colors. As with all
|
|
flamegraphs, these are better for visualization than detailed
|
|
drill-down. On Linux, `cargo-flamegraph` uses
|
|
[`perf`](https://perf.wiki.kernel.org/index.php/Main_Page) under the
|
|
hood.
|
|
|
|
To install, make sure you have a working version of `perf`
|
|
installed. Then run `cargo install flamegraph`.
|
|
|
|
Basic usage:
|
|
|
|
```
|
|
flamegraph {command}
|
|
```
|
|
|
|
Output: `flamegraph.svg`
|
|
|
|
Also consider using the `--reverse` flag, to reverse the stack and see the
|
|
lowest-level functions that get the most use.
|
|
|
|
### tcmalloc and pprof
|
|
|
|
This can generate usage graphs showing who allocated your memory when.
|
|
(It can get a bit confusing in Rust.)
|
|
|
|
```
|
|
HEAPPROFILE=/tmp/heap.hprof \
|
|
LD_PRELOAD=/usr/lib64/libtcmalloc_and_profiler.so \
|
|
{command}
|
|
```
|
|
|
|
```
|
|
pprof --pdf --inuse_space {binary} /tmp/heap.hprof > heap.pdf
|
|
```
|
|
|
|
You might need a longer timeout with this one; it's nontrivial.
|
|
|
|
### valgrind --massif
|
|
|
|
This tool can also generate usage graphs like pprof above.
|
|
|
|
`valgrind --tool=massif {command}`
|
|
|
|
It will generate a file called `massif.out.PID`. You can view it with the
|
|
`ms_print` tool (included with valgrind) or the `massif-visualizer` tool
|
|
(installed separately, highly recommended.)
|
|
|
|
## Some commands to profile
|
|
|
|
These should generally run against a chutney network whenever possible;
|
|
the `ARTI_CONF` envvar should be set to
|
|
e.g. `$(pwd)/chutney/net/nodes/arti.toml`.
|
|
|
|
### Bootstrapping a directory
|
|
|
|
`arti-testing bootstrap -c ${ARTI_CONF}`
|
|
|
|
(This test bootstraps only. It might make sense to do this one on the
|
|
real network, since its data is more complex. You need to start with an
|
|
empty set of state files for this to test bootstrapping instead of
|
|
loading.)
|
|
|
|
### Large number of circuits, focusing on circuit construction
|
|
|
|
Bootstrap outside of benchmarking, then run:
|
|
|
|
`arti-bench -u 1 -d 1 -s 100 -C 20 -p 1 -c ${ARTI_CONF}`
|
|
|
|
(100 samples, 20 circuits per sample, 1 stream per circuit, only 1 byte
|
|
to upload or download.)
|
|
|
|
Note that this test won't necessarily tell you so much about _path
|
|
construction_, since path construction on a large real network with
|
|
different weights, policies, and families is more complex than on a
|
|
chutney network.
|
|
|
|
(just times out with chutney; directory changes too fast, I think.)
|
|
|
|
|
|
### Running offline
|
|
|
|
Also
|
|
|
|
* Bootstrapping failure conditional
|
|
* Going offline
|
|
* Primary guards go down after bootstrap
|
|
|
|
(See `HowToBreak.md`)
|
|
|
|
### Data transfer
|
|
|
|
`arti-bench -s 20 -C 1 -p 1 {...}`
|
|
|
|
(No parallelism, 10 MB up and down.)
|
|
|
|
### Data transfer with many circuits
|
|
|
|
`arti-bench -s 1 -C 64 -p 1 -c ${ARTI_CONF}`
|
|
|
|
(Circuit parallelism only, 10 mb up and down)
|
|
|
|
### Data transfer with many streams
|
|
|
|
`arti-bench -s 1 -C 1 -p 64 -c ${ARTI_CONF}`
|
|
|
|
(Stream parallelism only, 10 mb up and down)
|
|
|
|
### Huge number of simultaneous connection attempts
|
|
|
|
`arti-bench -s 1 -C 16 -p 16 -c ${ARTI_CONF}`
|
|
|
|
(stream and circuit parallelism)
|
|
|
|
# TODO
|
|
|
|
arti-bench:
|
|
- take a target address as a string.
|
|
- Allow -p 0 to build a circuit only?
|
|
- Some way to build a path only?
|
|
|
|
Extract chutney boilerplate.
|
|
|
|
arti-testing:
|
|
- ability to make connections aggressively simultaneous
|
|
|