This only affects uses of thread_rng(), and affects them all more or
less indiscriminately. One test does not work with
ARTI_TEST_PRNG=deterministic; the next commit will fix it.
This is a general-purpose implementation of the ad-hoc approach
currently taken in (eg) crates/tor-proto/src/channel/reactor.rs,
with an API intended to defned against the more obvious mistakes.
This allows us to separate the two concerns: the channel reactor can
focus on handling channel cells and control messages and is over 2.5x
shorter.
The complexity of the manual sink implementation, and the machinery
needed to avoid having to suspend while holding an item, are dealt
with separately. That separate implemenation now has proper
documentation. (Tests are in the nest commit to avoid this one being
even more unwieldy.)
We use `extend` to define this as an extension trait. A competitor is
`ext` but in my personal projects I have found `extend` slightly
better.
The type of ret.map_err(codec_err_to_chan)? is (). ISTM that
writing `let () = ` makes it clear that there is nothing there,
but the lint forbids this.
This lint is warn by default and trips here for me on current nightly.
It seems wrong to me. We should be able to make it clear to the
reader that there is nothing here - note how this differs from the
lines below where Ready contains msg. A let () binding is a good way
to do that.
I think the lint allow ought to be added everywhere, but that doesn't
seem easy right now - see this issue about maint/add_warning:
https://gitlab.torproject.org/tpo/core/arti/-/issues/469
Fortunately, we don't need a separate type here: authenticated
clock skew can only come attached to a `tor_proto::Error`.
We also remove skew from `tor_proto::Error::HandshakeCertsExpired`,
since it would now be redundant.
of a channel.
At first I wanted to have this information not be a part of channels
at all, but it is a fairly tiny amount of data, and the alternatives
are pretty crufty.
We now check the handshake certificates unconditionally, and only
report them as _expired_ as a last resort.
(Rationale: if somebody is presenting the wrong identity from a year
ago, it is more interesting that they are presenting the wrong ID
than it is that they are doing so with an expired cert.
We also now report a different error if the certificate is expired,
but its expiration is within the range of reported clock skew.
(Rationale: it's helpful to distinguish this case, so that we can
blame the failure on possible clock skew rather than definitely
attributing it to a misbehaving relay.)
Part of #405.
NETINFO cells, which are sent in every handshake, may contain
timestamps. This patch adds an accessor for the timestamp in the
Netinfo messages, and teaches the tor-proto code how to compute the
minimum clock skew in the code.
The computation isn't terribly precise, but it doesn't need to be:
Tor should work fine if your clock is accurate to within a few
hours.
This patch also notes a Y2038 problem in the protocol: see
torspec#80.
Part of #405.
Each channel now remembers an OwnedChanTarget.
Each circuit now remembers a vector of OwnedChanTarget to represent
the path that it was constructed for.
Part of #415.
This helps the user distinguish between protocol violations that
happen when connecting to the tor network from those that happen
while connected.
Closes#358.
Additionally, refactor the IoError out of tor_cell::Error:
nothing in TorCell created this; it was only used by tor_proto.
This required refactoring in tor_proto to use a new error type. Here I
decided to use a new CodecError for now, though we may refactor that
away soon too.
This fixes a tiny race condition in the previous code, where we
checked whether an OptTimestamp is None a bit before we set it.
Since std::atomic gives us compare_exchange, we might as well use
it.
We want to only use TODO in the codebase for non-blockers, and open
tickets for anything that is a bigger blocker than a TODO. These
XXXXs seem like definite non-blockers to me.
Part of arti#231.
This test seems unreliable on CI: we've got to disable them for now
so that we have a working CI system. The CI failure is #238; the
ticket to repair them is #244.
The `bad_extend_*` failures were caused by bad test code in
`bad_extend_test_impl` that used `futures::join!`; this meant that the
reactor could receive the `Extended2` cell before it actually got the
`ExtendNtor` request, which caused it to get (quite rightly) confused
and close the circuit. Spawning a background thread which has a short
delay before sending the `Extended2` cell seems to have alleviated this
problem.
`new_circ_create_failure` is similar; I think the reactor was getting
dropped before it had a chance to flush out its `CreateFast` cell
properly, because it had already gotten the result back (since the test
code sends it indiscriminately). This was "fixed" in much the same
manner as the other test: making it wait a bit before sending the result
cell back.
There seem to be other tests that use `futures::join!` (like
`begindir`?), and use similarly erroneous patterns; I haven't gotten any
to fail reliably enough to be able to debug them, though.
Previously, the reactor would use an `UnboundedSender` to send things to
the `RawCellStream`, in order that the reactor wouldn't block if you
failed to read from the latter. This is bad, though, since it means
people can just run us out of memory by sending lots of things.
To fix this, we make the new `StreamReader` type (which does the reading
parts from `RawCellStream`) keep track of the stream's receive window
and issue SENDMEs once *it* has consumed enough data to require it, thus
meaning that we shouldn't get sent enough data to fill the channel
between reactor and `StreamReader` (and, if we do, that's someone trying
to flood us, and we abort the circuit).
As hinted to above, the `RawCellStream` was removed and its reading
functionalities replaced by `StreamReader`; its writing functionalities
are handled by `StreamTarget` anyway, so we just give out one of those
for the write end. This now means we don't need any mutexes!
note: this commit introduces a known issue, arti#230
Rather like e8e9699c3c ("Get rid of
tor-proto's ChannelImpl, and use the reactor more instead"), this
admittedly rather large commit refactors the way circuits in `tor-proto`
work, centralising all of the logic in one large nonblocking reactor
which other things send messages into and out of, instead of having a
bunch of `-Impl` types that are protected by mutexes.
Congestion control becomes a lot simpler with this refactor, since the
reactor can manage both stream- and circuit-level congestion control
unilaterally without having to share this information with consumers,
meaning we can get rid of some locks.
The way streams work also changes, in order to facilitate better
handling of backpressure / fairness between streams: each stream now has
a set of channels to send and receive messages over, instead of sending
relay cells directly onto the channel (now, the reactor pulls messages
off each stream in each map, and tries to avoid doing so if it won't be
able to forward them yet).
Additionally, a lot of "close this circuit / stream" messages aren't
required any more, since that state is simply indicated by one end of a
channel going away. This should make cleanup a lot less brittle.
Getting all of this to work involved writing a fair deal of intricate
nonblocking code in Reactor::run_once that tries very hard to be mindful
of making backpressure work correctly (and congestion control); the old
code could get away with having tasks .await on things, but the new
reactor can't really do this (as it'd lock the reactor up), so has to do
everything in a nonblocking manner.
@nickm pointed out that refactoring tor_proto::channel's Reactor to do
sending as well meant that it could only send or receive, but not both,
simultaneously, which was bad!
To fix this, rewrite Reactor::run_once to use a handcrafted future (with
futures::future::poll_fn) that can handle the logic required to push
items onto the sink asynchronously (i.e. checking that it can be written
to before trying to do that, and then flushing it).
This also means we don't use select_biased! any more, and just handroll
that logic ourselves; as a small bonus, we can now process all 3 kinds
of message in one run_once() call, instead of having to do only one of
them.
Instead of awkwardly sharing the internals of a `tor-proto` `Channel`
between the reactor task and any other tasks, move most of the internals
into the reactor and have other tasks communicate with the reactor via
message-passing to allocate circuits and send cells.
This makes a lot of things simple, and has convenient properties like
not needing to wrap the `Channel` in an `Arc` (though some places in the
code still do this for now).
A lot of test code required tweaking in order to deal with the refactor;
in fact, fixing the tests probably took longer than writing the mainline
code (!). Importantly, we now use `tokio`'s `tokio::test` annotation
instead of `async_test`, so that we can run things in the background
(which is required to have reactors running for the circuit tests).
This is an instance of #205, and also kind of #217.
We need this for the circuit timeout estimator (#57). It needs to
know "how recently have we got some incoming traffic", so that it
can tell whether a circuit has truly timed out, or whether the
entire network is down.
I'm implementing this with coarsetime, since we need to update these
in response to every single incoming cell, and we need the timestamp
operation to be _fast_.
(This reinstates an earlier commit, f30b2280, which I reverted
because we didn't need it at the time.)
Closes#179.
There wasn't any good reason for tor-proto's channel reactor to use a
shedload of oneshot channels instead of just an mpsc UnboundedSender,
and the whole `CtrlResult` thing made even less sense.
Straighten this code out by replacing all of that machinery with a
simple UnboundedSender, instead.
(part of arti#218)