This is a revised version of !397; it implements a scheduling system for
periodic tasks that can be externally controlled, and then uses the
external control aspect to implement a basic dormant mode (#90).
More technically, the scheduling system consists of a `Stream` that
periodic tasks are expected to embed in a `while` loop or similar, a
way for tasks themselves to choose how long to wait until the stream
next yields a result, and a handle to control this outside of the task.
We now check the handshake certificates unconditionally, and only
report them as _expired_ as a last resort.
(Rationale: if somebody is presenting the wrong identity from a year
ago, it is more interesting that they are presenting the wrong ID
than it is that they are doing so with an expired cert.
We also now report a different error if the certificate is expired,
but its expiration is within the range of reported clock skew.
(Rationale: it's helpful to distinguish this case, so that we can
blame the failure on possible clock skew rather than definitely
attributing it to a misbehaving relay.)
Part of #405.
NETINFO cells, which are sent in every handshake, may contain
timestamps. This patch adds an accessor for the timestamp in the
Netinfo messages, and teaches the tor-proto code how to compute the
minimum clock skew in the code.
The computation isn't terribly precise, but it doesn't need to be:
Tor should work fine if your clock is accurate to within a few
hours.
This patch also notes a Y2038 problem in the protocol: see
torspec#80.
Part of #405.
Some error types indicate that the guard has failed as a dircache.
We should treat these errors as signs to close the circuit, and to
mark the guard as having failed.
This commit refactors the dirclient error type into two cases:
errors when constructing a circuit, and errors that occur once we
already have a one-hop circuit. The latter can usually be
attributed to the specific cache we're talking to.
This commit also adds a function to expose the information about
which directory gave us the info.
We already have the ability to get peer information from ChanMgr
errors, and therefore from any RetryErrors that contain ChanMgr
errors.
This commit adds optional peer information to tor-proto errors, and
a function to expose whatever peer information is available.
It'll soon more convenient to pass in FallbackDirs as a slice of
references, rather than just a slice of FallbackDirs: I'm going to
be changing how we handle these in tor-dirmgr.
If all guards are down and they won't be retriable for a while, try
waiting that long to get whichever guard _is_ retriable.
Additionally, if we are making multiple circuit plans in parallel,
only report our planning as having failed if we failed at making
_all_ the plans. Previously we treated any failure as fatal for the
other plans, which could lead to trouble in the case when guards
were all down or pending.
Part of #407.
When all guards are down, we would previously mark them all as up,
and retry aggressively. But that's far too aggressive: if there's
something wrong with our ability to connect to guards, it makes us
hammer the network over and over, ignoring all the guard retry
timeouts in practice.
Instead,
* We now allow the `pick_guard()` function to fail without
automatically retrying.
* We give different errors in the cases when all our guards are
down, and when all of the guards selected by our active usage
are down.
* Our "guards are down" error includes the time at which a guard
will next be retriable.
This is part of #407.
C tor used one schedule, and guard-spec specified another. But in
reality we should probably use a randomized schedule to retry
guards, for the reasons explained in the documentation for
RetrySchedule.
I've chosen the minima to be not too far from our previous minima
for primary and non-primary guards.
This is part of #407.