Previously we did not distinguish errors that came from pending
circuits from errors that came from the circuits we were
building. We also reported errors as coming from "Left" or "Right",
instead of a more reasonable description.
We were treating restrict_mut() failures as internal errors, and
using internal errors to represent them. But in fact, these
failures are entirely possible based on timing. Here's how it
happens:
* Two different circuit requests arrive at the same time, and both
notice a pending circuit that they could use.
* The pending circuit completes; both pending requests are notified.
* The first request calls restrict_mut(), and restricts the request
in such a way that the second couldn't use it.
* The second request calls restrict_mut(), and gets a failure.
Because of this issue, we treat these errors as transient failures
and just wait for another circuit.
Closes#427.
(This is not a breaking API change, since `AbstractSpec` is a
crate-private trait.)
Not all of these strictly need to be bumped to 0.2.0; many could go
to 0.1.1 instead. But since everything at the tor-rtcompat and
higher layers has had breaking API changes, it seems not so useful
to distinguish. (It seems unlikely that anybody at this stage is
depending on e.g. tor-protover but not arti-client.)
Unlike the rest of the crates, these don't have a "tor-" or "arti-"
prefix, and are potentially used by code outside arti. With that in
mind, it's probably for the best not to bump them to 0.2.0 along
with the rest of our crates.
They have had no changes since 0.1.0 other than refactoring and
changing of clippy lints. Therefore, I'm not bumping the
dependencies from other crates onto these: it's fine whether our
other crates use caret/retry-error 0.1.0 or 0.1.1.
This feature allows us to detect different failing cases for
arti#329 that would otherwise be hard to induce. It works by
filtering consensus directory objects and/or microdescriptor objects
before introducing them to the directory manager.
Closes#397.
This commit uses the `visibility` and `visible` crates to
conditionally make certain structs and their fields public
(respectively). This is incredibly dangerous to use for anything
besides testing, and I've tried to write the documentation for the
feature accordingly.
The older default seems (experimentally) to be ridiculously high.
Generally, if we can't build a circuit within a handful attempts,
that circuit has already timed out... unless there is a fast-failure
condition, in which case we're just hammering the network (or our
view of it.)
Found with `arti-testing` for #329.
Previously, if we had launch_parallelism > 1, and we were willing to
retry building a circuit max_retries times, then we'd launch up to
max_retries * launch_parallelism circuits before giving up. Ouch!
With this patch, we try to keep the total number of circuits
planned and attempted to the actual max_retries limit.
Part of #329; found with arti-testing.
The previous algorithm had two flaws:
* It would wait even after the final attempt, when there were no
more retries to do.
* It would fail to wait between attempts if an error occurred.
This refactoring fixes both of these issues, and adds some comments.
The FirstHopId type now records an enum that stores whether the hop
is a guard or a fallback. This change addresses concerns about
remembering to check the type or source of an Id before passing it
down to the FallbackState or GuardSet.
Making this change required an API change, so that dirmgr can
report success/failure status without actually knowing whether it's
using a fallback or a guard.