Additionally, refactor the IoError out of tor_cell::Error:
nothing in TorCell created this; it was only used by tor_proto.
This required refactoring in tor_proto to use a new error type. Here I
decided to use a new CodecError for now, though we may refactor that
away soon too.
This fixes a tiny race condition in the previous code, where we
checked whether an OptTimestamp is None a bit before we set it.
Since std::atomic gives us compare_exchange, we might as well use
it.
A number of severe problems with the circuit reactor were fixed which
could cause reordering of cells (which causes relays to terminate the
circuit with a protocol violation, as they become unable to decrypt
them). These mostly revolve around improper usage of queues:
- The code assumed that a failure to place cells onto the channel would
persist for the duration of a reactor cycle run. However, under high
contention, this wouldn't always be the case.
- This leads to some cells getting enqueued while others go straight
through, before the enqueued cells.
- To fix this, we block sending cells out of the channel while there
are still some enqueued.
- The hop-specific queues queued after encryption, not before. This was
very brittle, and led to frequent mis-ordering.
- This was fixed by making them not do that.
This is arti!264 / 5bce9db562 without the
refactor part.
This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
This is a fine example of why booleans are risky:
it's far to easy to pass "animate:bool" into "inanimate:bool" like
we did here.
This is a followup from our fix to #294.
Previously we were requiring authenticated sendme cells exactly when we
should be permitting the old format, and vice versa.
This bug was caused by using a boolean to represent one property, but
with giving that boolean two different senses without inverting at the
right time.
The next commit will prevent a recurrence.
Closes#294
This commit addresses multiple problems highlighted by arti#182:
- `arti-client` had some types in its public API that weren't accessible
without importing another crate (`CfgPath`, `DataReader`,
`DataWriter`). This has been fixed.
- In addition, the doc comments for `DataReader` and `DataWriter` were
cleaned up to be of better quality, now that they're public.
- It was impossible to use `arti-client` without also importing
`tor-rtcompat`. This is now fixed by the addition of two convenience
methods: `TorClient::bootstrap_with_tokio` and
`TorClient::bootstrap_with_async_std`.
- Potentially controversially: `tor-rtcompat` now returns *concrete*
types from methods like `current_runtime`, instead of `impl Runtime`.
- This was needed in order to actually be able to name the `TorClient`
type that results from using these methods.
- This does mean we lose API flexibility, but on balance I think this
is a good thing, because the API we *do* have is actually usable...
I found these versions empirically, by using the following process:
First, I used `cargo tree --depth 1 --kind all` to get a list of
every immediate dependency we had.
Then, I used `cargo upgrade --workspace package@version` to change
each dependency to the earliest version with which (in theory) the
current version is semver-compatible. IOW, if the current version
was 3.2.3, I picked "3". If the current version was 0.12.8, I
picked "0.12".
Then, I used `cargo +nightly upgrade -Z minimal-versions` to
downgrade Cargo.lock to the minimal listed version for each
dependency. (I had to override a few packages; see .gitlab-ci.yml
for details).
Finally, I repeatedly increased the version of each of our
dependencies until our code compiled and the tests passed. Here's
what I found that we need:
anyhow >= 1.0.5: Earlier versions break our hyper example.
async-broadcast >= 0.3.2: Earlier versions fail our tests.
async-compression 0.3.5: Earlier versions handled futures and tokio
differently.
async-trait >= 0.1.2: Earlier versions are too buggy to compile our
code.
clap 2.33.0: For Arg::default_value_os().
coarsetime >= 0.1.20: exposed as_ticks() function.
curve25519-dalek >= 3.2: For is_identity().
generic-array 0.14.3: Earlier versions don't implement
From<&[T; 32]>
httparse >= 1.2: Earlier versions didn't implement Error.
itertools at 0.10.1: For at_most_once.
rusqlite >= 0.26.3: for backward compatibility with older rustc.
serde 1.0.103: Older versions break our code.
serde_json >= 1.0.50: Since we need its Value type to implement Eq.
shellexpand >= 2.1: To avoid a broken dirs crate version.
tokio >= 1.4: For Handle::block_on().
tracing >= 0.1.18: Previously, tracing_core and tracing had separate
LevelFilter types.
typenum >= 1.12: Compatibility with rust-crypto crates
x25519-dalek >= 1.2.0: For was_contributory().
Closes#275.
Previously we'd always set it to true, allowing one CONNECTED per
half-closed stream even if the stream had already received a
CONNECTED cell.
This resolves an XXXX.
There's no known attack here, but it's best practice to always compare
digests using a constant-time comparison operator.
This resolves an XXXX comment.
It makes sense to put the method for human-readable strings onto the
type itself, so that we can format these whenever they occur.
I'm choosing the "human_str" method name here, since caret-generated
types already have a to_str. I was thinking about using Display,
but caret types already implement that.
I've also moved the message from "warn!" to "debug!", since these
aren't necessarily a problem condition.
(It's a protocol violation to get a SENDME when our send window is
already full.)
This patch makes SendWindow::put return a Result, so that it's
easier to do the right thing with it.
Closes#261.
arti!126 overhauled the `tor-proto` circuit reactor, but left out one
very important thing: actually decrementing the SENDME window for
streams (not circuits) when we send cells along them.
Since the circuit-level SENDME window would often prevent us from
running into a problem, this wasn't caught until my benchmarking efforts
noticed it (in the form of Tor nodes aborting the circuit for a protocol
violation).
fixes arti#260
We want to only use TODO in the codebase for non-blockers, and open
tickets for anything that is a bigger blocker than a TODO. These
XXXXs seem like definite non-blockers to me.
Part of arti#231.
This test seems unreliable on CI: we've got to disable them for now
so that we have a working CI system. The CI failure is #238; the
ticket to repair them is #244.
I traced the problem here to the fact that sometimes "rx" in this
test would be dropped before the test was done. When "rx" is
dropped, the channel reactor shuts down, which in turn kills off the
circuit reactor.
This bug may exist in other cases in these tests. This patch may
fix one case of #238.
The `bad_extend_*` failures were caused by bad test code in
`bad_extend_test_impl` that used `futures::join!`; this meant that the
reactor could receive the `Extended2` cell before it actually got the
`ExtendNtor` request, which caused it to get (quite rightly) confused
and close the circuit. Spawning a background thread which has a short
delay before sending the `Extended2` cell seems to have alleviated this
problem.
`new_circ_create_failure` is similar; I think the reactor was getting
dropped before it had a chance to flush out its `CreateFast` cell
properly, because it had already gotten the result back (since the test
code sends it indiscriminately). This was "fixed" in much the same
manner as the other test: making it wait a bit before sending the result
cell back.
There seem to be other tests that use `futures::join!` (like
`begindir`?), and use similarly erroneous patterns; I haven't gotten any
to fail reliably enough to be able to debug them, though.
Previously, the reactor would use an `UnboundedSender` to send things to
the `RawCellStream`, in order that the reactor wouldn't block if you
failed to read from the latter. This is bad, though, since it means
people can just run us out of memory by sending lots of things.
To fix this, we make the new `StreamReader` type (which does the reading
parts from `RawCellStream`) keep track of the stream's receive window
and issue SENDMEs once *it* has consumed enough data to require it, thus
meaning that we shouldn't get sent enough data to fill the channel
between reactor and `StreamReader` (and, if we do, that's someone trying
to flood us, and we abort the circuit).
As hinted to above, the `RawCellStream` was removed and its reading
functionalities replaced by `StreamReader`; its writing functionalities
are handled by `StreamTarget` anyway, so we just give out one of those
for the write end. This now means we don't need any mutexes!
note: this commit introduces a known issue, arti#230
Rather like e8e9699c3c ("Get rid of
tor-proto's ChannelImpl, and use the reactor more instead"), this
admittedly rather large commit refactors the way circuits in `tor-proto`
work, centralising all of the logic in one large nonblocking reactor
which other things send messages into and out of, instead of having a
bunch of `-Impl` types that are protected by mutexes.
Congestion control becomes a lot simpler with this refactor, since the
reactor can manage both stream- and circuit-level congestion control
unilaterally without having to share this information with consumers,
meaning we can get rid of some locks.
The way streams work also changes, in order to facilitate better
handling of backpressure / fairness between streams: each stream now has
a set of channels to send and receive messages over, instead of sending
relay cells directly onto the channel (now, the reactor pulls messages
off each stream in each map, and tries to avoid doing so if it won't be
able to forward them yet).
Additionally, a lot of "close this circuit / stream" messages aren't
required any more, since that state is simply indicated by one end of a
channel going away. This should make cleanup a lot less brittle.
Getting all of this to work involved writing a fair deal of intricate
nonblocking code in Reactor::run_once that tries very hard to be mindful
of making backpressure work correctly (and congestion control); the old
code could get away with having tasks .await on things, but the new
reactor can't really do this (as it'd lock the reactor up), so has to do
everything in a nonblocking manner.
@nickm pointed out that refactoring tor_proto::channel's Reactor to do
sending as well meant that it could only send or receive, but not both,
simultaneously, which was bad!
To fix this, rewrite Reactor::run_once to use a handcrafted future (with
futures::future::poll_fn) that can handle the logic required to push
items onto the sink asynchronously (i.e. checking that it can be written
to before trying to do that, and then flushing it).
This also means we don't use select_biased! any more, and just handroll
that logic ourselves; as a small bonus, we can now process all 3 kinds
of message in one run_once() call, instead of having to do only one of
them.
Instead of awkwardly sharing the internals of a `tor-proto` `Channel`
between the reactor task and any other tasks, move most of the internals
into the reactor and have other tasks communicate with the reactor via
message-passing to allocate circuits and send cells.
This makes a lot of things simple, and has convenient properties like
not needing to wrap the `Channel` in an `Arc` (though some places in the
code still do this for now).
A lot of test code required tweaking in order to deal with the refactor;
in fact, fixing the tests probably took longer than writing the mainline
code (!). Importantly, we now use `tokio`'s `tokio::test` annotation
instead of `async_test`, so that we can run things in the background
(which is required to have reactors running for the circuit tests).
This is an instance of #205, and also kind of #217.
We need this for the circuit timeout estimator (#57). It needs to
know "how recently have we got some incoming traffic", so that it
can tell whether a circuit has truly timed out, or whether the
entire network is down.
I'm implementing this with coarsetime, since we need to update these
in response to every single incoming cell, and we need the timestamp
operation to be _fast_.
(This reinstates an earlier commit, f30b2280, which I reverted
because we didn't need it at the time.)
Closes#179.
Basically the same thing as 371437d338
("Refactor tor_proto::channel::Reactor to use an UnboundedSender"), but
for tor_proto::circuit's Reactor instead.
(part of arti#217)
There wasn't any good reason for tor-proto's channel reactor to use a
shedload of oneshot channels instead of just an mpsc UnboundedSender,
and the whole `CtrlResult` thing made even less sense.
Straighten this code out by replacing all of that machinery with a
simple UnboundedSender, instead.
(part of arti#218)