Some error types indicate that the guard has failed as a dircache.
We should treat these errors as signs to close the circuit, and to
mark the guard as having failed.
This commit refactors the dirclient error type into two cases:
errors when constructing a circuit, and errors that occur once we
already have a one-hop circuit. The latter can usually be
attributed to the specific cache we're talking to.
This commit also adds a function to expose the information about
which directory gave us the info.
We already have the ability to get peer information from ChanMgr
errors, and therefore from any RetryErrors that contain ChanMgr
errors.
This commit adds optional peer information to tor-proto errors, and
a function to expose whatever peer information is available.
It'll soon more convenient to pass in FallbackDirs as a slice of
references, rather than just a slice of FallbackDirs: I'm going to
be changing how we handle these in tor-dirmgr.
If all guards are down and they won't be retriable for a while, try
waiting that long to get whichever guard _is_ retriable.
Additionally, if we are making multiple circuit plans in parallel,
only report our planning as having failed if we failed at making
_all_ the plans. Previously we treated any failure as fatal for the
other plans, which could lead to trouble in the case when guards
were all down or pending.
Part of #407.
When all guards are down, we would previously mark them all as up,
and retry aggressively. But that's far too aggressive: if there's
something wrong with our ability to connect to guards, it makes us
hammer the network over and over, ignoring all the guard retry
timeouts in practice.
Instead,
* We now allow the `pick_guard()` function to fail without
automatically retrying.
* We give different errors in the cases when all our guards are
down, and when all of the guards selected by our active usage
are down.
* Our "guards are down" error includes the time at which a guard
will next be retriable.
This is part of #407.
C tor used one schedule, and guard-spec specified another. But in
reality we should probably use a randomized schedule to retry
guards, for the reasons explained in the documentation for
RetrySchedule.
I've chosen the minima to be not too far from our previous minima
for primary and non-primary guards.
This is part of #407.
Currently, Arti doesn't need this. But once it does, it will be
way better to have a separate type for connected sockets, rather
than having to error-check every time somebody gives us a socket.
Part of #410
Each channel now remembers an OwnedChanTarget.
Each circuit now remembers a vector of OwnedChanTarget to represent
the path that it was constructed for.
Part of #415.
Instead of requiring a `Box<dyn Isolation>`, it now takes either a
`Box<dyn Isolation>`, or an arbitrary `T` that implements
`Isolation`.
This API still allows the user to pass in a `Box<dyn Isolation>` if
that's what they have, but it doesn't require them to Box the
isolation on their own.
Part of #414.
Now we use NetParams. That implies making its constructor public,
which I think it fine.
This is related to #413 but is far from completing that ticket.
This handwritten conversion function omitted a field. There was
nothing to spot this mistake.
IMO this shows why these particular types ought not to use builders,
but instead, should cause API breaks when things change.
Adding this line here to explicitly fix the bug, although we are about
to abolish this function completely almost right away.
This has the different syntax for builder field attributes than what I
originally proposed in my MR, and which therefore is in the pinned
branch.
My upstream MR for the field attributes feature was morged:
https://github.com/colin-kiegel/rust-derive-builder/issues/239
Every time we want a microdescriptor, we know the index of that
microdesc's corresponding routerstatus within the consensus.
Therefore, we can use that index to store `Arc<Microdesc>`s in a
dense array, and not have to use a HashSet here at all.
We were using a hashtable to keep track of missing microdescriptor
digests. But this information is redundant with the NetDir state,
and there's now no longer any performance benefit to keeping a
separate copy.
Part of #386.
We previously kept missing-MD entries and present-MD entries all in
the same HashSet, which resulted in using more slack space than we
need. Now we use separate tables, so we can drop missing-MD
entries as we move forward.
Also, when constructing a NetDir, set its hash tables to their final
capacities.
This also lets us simplify some of our missing-md-listing code a
lot.
We never want a consensus document that's super-old, since we would
reject it immediately for being too old.
Also, never send an if-modified-since that's so old that we'd reject
the response.
Closes#403
This has the humantime_serde::option module, which we have upstreamed
and are about to switch to.
The remaining dependency with version = "1" is going to be removed
in a moment.
This should save around 1MB per consensus, since every relay has a
'protocols' lines, but there are only a few distinct possibilities
for such a line.
Closes#385.
When the version is a Tor version, we can just parse it; otherwise,
we can intern it. This shrinks GenericRouterStatus and avoids a lot
of extra help allocations.
This is an API break: now one must use `.tor()` to access the Tor
configuration parts.
But it is not a config file format break, because `#[serde(flatten)]`.
This hashtable starts out pretty large, but it can spend most of our
runtime (when we aren't downloading) being small. To avoid doing
too much work, I've made it so we only call shrink_to_fit twice per
consensus: once when we're no longer pending, and once when we're
complete.
Closes#388.
Previously we'd allocate an error as a place-holder here, but it's
not a great idea to do that with a `Bug`: each `Bug` stores a whole
stack trace, which uses a whole pile of allocations to construct.
Now we keep an `Option<Error>` instead.
Found while heap profiling.
Closes#383.
Replace the recapitulation of TorClientConfig fields in ArtiConfig and
instead just have it contain one. This is part of #374.
The conversions from ArtiConfig back to ArtiConfigBuilder and
TorClientConfigBuilder would need to change, but, since we don't want
them anyway,
No longer impl Deserialize for ArtiConfig. (As per #371 this will
want to become a private type.)
No longer impl From<ArtiConfig> for ArtiConfigBuilder and
TorClientConfigBuilder. And abolish tests of that code.
(This all has to be in one commit, because previously
ArtiConfig::tor_client_config used the validated-to-builder config
retcon.)
I used
git-grep -P '\#\[serde\((?!default|deny_unknown)'
to find places where I needed to add additional attributes on the
builder method fields.
This is currently a bit duplicative, but when #371 is completely done,
the validated (non-builder) configs won't need to be Deserialize any
more.
This is part of #371 and #372.
We are going to want to specify custom attributes on fields of the
builder struct. This feature was missing from derive_builder.
This commitid is the current head of my MR branch
https://github.com/colin-kiegel/rust-derive-builder/pull/237https://github.com/ijackson/rust-derive-builder/tree/builder-field-attrs
Using the commitid prevents surprises if that branch is updated.
We will require this newer version of derive_builder. The version
will need to be bumped again later, assuming the upstream MR is merged
and upstream do a release containing the needed changes.
This commit adds support for a BrokenTcp provider that can make
connection attempts fail or time out. It doesn't yet have a way to
turn on the failure.
This makes Arti usable in IPv6-only environments (arti#92) by letting us
attempt multiple connections to a given relay using all of its
addresses instead of just using the first (probably IPv4) one, using the
strategy from RFC 8305 § 5.
This isn't a complete implementation of Happy Eyeballs; ideally, we'd
sort the address list before doing concurrent connections. However, it
works (and has been tested inside an IPv6-only container inside eta's
network :p)
The doc include rune does not work with our MSRV; it needs 1.54.
The alternative would be some kind of cfg() but that would
- not provide the crate-level doc on Rust 1.53
- involve the use of cfg_attr
Instead, just do it the old way.
Previously we tried to do each connection in a run, and only then did we
start transferring data over them. Now we collect a bunch of the
futures that return an open stream, and run them all in parallel
with using them. This change includes connect-time in our
benchmarks, and allows us to test contention in our connect code.
Instead of using a Stream, I've changed the connection-generation
code to call a future-returning function directly, so we have a way
to explicitly pass which run we're in.
This commit changes the main parsing code for RsaIdentity in
tor-netdoc, and .
Previously, parse_hex_ident was something like 10% of our startup
CPU time; now it's only like ~2%. (Still not perfect, but way
better.)
Closes#377.
We perform this operation in a bunch of places, and most of them
use hex::decode(). That's not great, since hex::decode() has to do
heap allocation. This implementation uses hex::decode_to_slice(),
which should be faster.
(In the future we might choose to use one of the faster hex
implementations, but I'm hoping that this change will be sufficient
to get hex decoding out of our profiles.)
Part of #377.
Previously they returned an Arc, which wasn't necessary unless the
client actually _wanted_ a new Arc.
This would be an API break, except that these functions are marked
'experimental-api', so semver does not apply; nonetheless I've noted
the break in semver_status.md, just in case we care.
Closes#369
This commit adds a new program to try to implement the ideas behind
experimentation in arti#329. In particular, it tries to implement
basic client "can I bootstrap and connect" functionality testing,
with a lot of instrumentation, and support for breaking things.
So far, the instrumentation is limited to counting TCP bytes and
connections, and counting events. Still, this is enough to measure
behavior on some of the incorrect-clock tests.
NOTE:
For now, you are _required_ to pass in an explicit configuration, in
hopes that this will lead you to override your storage directories
for doing specific experiments.