This lint is IMO inherently ill-conceived.
I have looked for the reasons why this might be thought to be a good
idea and there were basically two (and they are sort of contradictory):
I. "Calling ‘.clone()` on an Rc, Arc, or Weak can obscure the fact
that only the pointer is being cloned, not the underlying data."
This is the wording from
https://rust-lang.github.io/rust-clippy/v0.0.212/#clone_on_ref_ptr
It is a bit terse; we are left to infer why it is a bad idea to
obscure this fact. It seems to me that if it is bad to obscure some
fact, that must be because the fact is a hazard. But why would it be
a hazard to not copy the underlying data ?
In other languages, faliing to copy the underlying data is a serious
correctness hazard. There is a whose class of bugs where things were
not copied, and then mutated and/or reused in multiple places in ways
that were not what the programmer intended. In my experience, this is
a very common bug when writing Python and Javascript. I'm told it's
common in golang too.
But in Rust this bug is much much harder to write. The data inside an
Arc is immutable. To have this bug you'd have use interior mutability
- ie mess around with Mutex or RefCell. That provides a good barrier
to these kind of accidents.
II. "The reason for writing Rc::clone and Arc::clone [is] to make it
clear that only the pointer is being cloned, as opposed to the
underlying data. The former is always fast, while the latter can
be very expensive depending on what is being cloned."
This is the reasoning found here
https://github.com/rust-lang/rust-clippy/issues/2048
This is saying that *not* using Arc::clone is hazardous.
Specifically, that a deep clone is a performance hazard.
But for this argument, the lint is precisely backwards. It's linting
the "good" case and asking for it to be written in a more explicit
way; while the supposedly bad case can be written conveniently.
Also, many objects (in our codebase, and in all the libraries we use)
that are Clone are in fact simply handles. They contain Arc(s) (or
similar) and are cheap to clone. Indeed, that is the usual case.
It does not make sense to distinguish in the syntax we use to clone
such a handle, whether the handle is a transparent Arc, or an opaque
struct containing one or more other handles.
Forcing Arc::clone to be written as such makes for code churn when a
type is changed from Arc<Something> to Something: Clone, or vice
versa.
My proximate motivation is that tls-api wants its inner streams to be
Debug. But in general, I agree with the Rust API Guidelines notion
that almost everything should be Debug.
I have gone for the "dump all the things" approach. A more nuanced
approach would be possible too.
This helps the user distinguish between protocol violations that
happen when connecting to the tor network from those that happen
while connected.
Closes#358.
There was only one use of this, and it was in as-yet-unused relay-only
code.
Removing this type required refactoring the relay onion handshake code
to use its own error type, which is probably clever anyway.
Additionally, refactor the IoError out of tor_cell::Error:
nothing in TorCell created this; it was only used by tor_proto.
This required refactoring in tor_proto to use a new error type. Here I
decided to use a new CodecError for now, though we may refactor that
away soon too.
This fixes a tiny race condition in the previous code, where we
checked whether an OptTimestamp is None a bit before we set it.
Since std::atomic gives us compare_exchange, we might as well use
it.
A number of severe problems with the circuit reactor were fixed which
could cause reordering of cells (which causes relays to terminate the
circuit with a protocol violation, as they become unable to decrypt
them). These mostly revolve around improper usage of queues:
- The code assumed that a failure to place cells onto the channel would
persist for the duration of a reactor cycle run. However, under high
contention, this wouldn't always be the case.
- This leads to some cells getting enqueued while others go straight
through, before the enqueued cells.
- To fix this, we block sending cells out of the channel while there
are still some enqueued.
- The hop-specific queues queued after encryption, not before. This was
very brittle, and led to frequent mis-ordering.
- This was fixed by making them not do that.
This is arti!264 / 5bce9db562 without the
refactor part.
This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
This is a fine example of why booleans are risky:
it's far to easy to pass "animate:bool" into "inanimate:bool" like
we did here.
This is a followup from our fix to #294.
Previously we were requiring authenticated sendme cells exactly when we
should be permitting the old format, and vice versa.
This bug was caused by using a boolean to represent one property, but
with giving that boolean two different senses without inverting at the
right time.
The next commit will prevent a recurrence.
Closes#294
This commit addresses multiple problems highlighted by arti#182:
- `arti-client` had some types in its public API that weren't accessible
without importing another crate (`CfgPath`, `DataReader`,
`DataWriter`). This has been fixed.
- In addition, the doc comments for `DataReader` and `DataWriter` were
cleaned up to be of better quality, now that they're public.
- It was impossible to use `arti-client` without also importing
`tor-rtcompat`. This is now fixed by the addition of two convenience
methods: `TorClient::bootstrap_with_tokio` and
`TorClient::bootstrap_with_async_std`.
- Potentially controversially: `tor-rtcompat` now returns *concrete*
types from methods like `current_runtime`, instead of `impl Runtime`.
- This was needed in order to actually be able to name the `TorClient`
type that results from using these methods.
- This does mean we lose API flexibility, but on balance I think this
is a good thing, because the API we *do* have is actually usable...
I found these versions empirically, by using the following process:
First, I used `cargo tree --depth 1 --kind all` to get a list of
every immediate dependency we had.
Then, I used `cargo upgrade --workspace package@version` to change
each dependency to the earliest version with which (in theory) the
current version is semver-compatible. IOW, if the current version
was 3.2.3, I picked "3". If the current version was 0.12.8, I
picked "0.12".
Then, I used `cargo +nightly upgrade -Z minimal-versions` to
downgrade Cargo.lock to the minimal listed version for each
dependency. (I had to override a few packages; see .gitlab-ci.yml
for details).
Finally, I repeatedly increased the version of each of our
dependencies until our code compiled and the tests passed. Here's
what I found that we need:
anyhow >= 1.0.5: Earlier versions break our hyper example.
async-broadcast >= 0.3.2: Earlier versions fail our tests.
async-compression 0.3.5: Earlier versions handled futures and tokio
differently.
async-trait >= 0.1.2: Earlier versions are too buggy to compile our
code.
clap 2.33.0: For Arg::default_value_os().
coarsetime >= 0.1.20: exposed as_ticks() function.
curve25519-dalek >= 3.2: For is_identity().
generic-array 0.14.3: Earlier versions don't implement
From<&[T; 32]>
httparse >= 1.2: Earlier versions didn't implement Error.
itertools at 0.10.1: For at_most_once.
rusqlite >= 0.26.3: for backward compatibility with older rustc.
serde 1.0.103: Older versions break our code.
serde_json >= 1.0.50: Since we need its Value type to implement Eq.
shellexpand >= 2.1: To avoid a broken dirs crate version.
tokio >= 1.4: For Handle::block_on().
tracing >= 0.1.18: Previously, tracing_core and tracing had separate
LevelFilter types.
typenum >= 1.12: Compatibility with rust-crypto crates
x25519-dalek >= 1.2.0: For was_contributory().
Closes#275.
Previously we'd always set it to true, allowing one CONNECTED per
half-closed stream even if the stream had already received a
CONNECTED cell.
This resolves an XXXX.
There's no known attack here, but it's best practice to always compare
digests using a constant-time comparison operator.
This resolves an XXXX comment.
It makes sense to put the method for human-readable strings onto the
type itself, so that we can format these whenever they occur.
I'm choosing the "human_str" method name here, since caret-generated
types already have a to_str. I was thinking about using Display,
but caret types already implement that.
I've also moved the message from "warn!" to "debug!", since these
aren't necessarily a problem condition.
(It's a protocol violation to get a SENDME when our send window is
already full.)
This patch makes SendWindow::put return a Result, so that it's
easier to do the right thing with it.
Closes#261.
arti!126 overhauled the `tor-proto` circuit reactor, but left out one
very important thing: actually decrementing the SENDME window for
streams (not circuits) when we send cells along them.
Since the circuit-level SENDME window would often prevent us from
running into a problem, this wasn't caught until my benchmarking efforts
noticed it (in the form of Tor nodes aborting the circuit for a protocol
violation).
fixes arti#260
We want to only use TODO in the codebase for non-blockers, and open
tickets for anything that is a bigger blocker than a TODO. These
XXXXs seem like definite non-blockers to me.
Part of arti#231.
This test seems unreliable on CI: we've got to disable them for now
so that we have a working CI system. The CI failure is #238; the
ticket to repair them is #244.
I traced the problem here to the fact that sometimes "rx" in this
test would be dropped before the test was done. When "rx" is
dropped, the channel reactor shuts down, which in turn kills off the
circuit reactor.
This bug may exist in other cases in these tests. This patch may
fix one case of #238.
The `bad_extend_*` failures were caused by bad test code in
`bad_extend_test_impl` that used `futures::join!`; this meant that the
reactor could receive the `Extended2` cell before it actually got the
`ExtendNtor` request, which caused it to get (quite rightly) confused
and close the circuit. Spawning a background thread which has a short
delay before sending the `Extended2` cell seems to have alleviated this
problem.
`new_circ_create_failure` is similar; I think the reactor was getting
dropped before it had a chance to flush out its `CreateFast` cell
properly, because it had already gotten the result back (since the test
code sends it indiscriminately). This was "fixed" in much the same
manner as the other test: making it wait a bit before sending the result
cell back.
There seem to be other tests that use `futures::join!` (like
`begindir`?), and use similarly erroneous patterns; I haven't gotten any
to fail reliably enough to be able to debug them, though.
Previously, the reactor would use an `UnboundedSender` to send things to
the `RawCellStream`, in order that the reactor wouldn't block if you
failed to read from the latter. This is bad, though, since it means
people can just run us out of memory by sending lots of things.
To fix this, we make the new `StreamReader` type (which does the reading
parts from `RawCellStream`) keep track of the stream's receive window
and issue SENDMEs once *it* has consumed enough data to require it, thus
meaning that we shouldn't get sent enough data to fill the channel
between reactor and `StreamReader` (and, if we do, that's someone trying
to flood us, and we abort the circuit).
As hinted to above, the `RawCellStream` was removed and its reading
functionalities replaced by `StreamReader`; its writing functionalities
are handled by `StreamTarget` anyway, so we just give out one of those
for the write end. This now means we don't need any mutexes!
note: this commit introduces a known issue, arti#230
Rather like e8e9699c3c ("Get rid of
tor-proto's ChannelImpl, and use the reactor more instead"), this
admittedly rather large commit refactors the way circuits in `tor-proto`
work, centralising all of the logic in one large nonblocking reactor
which other things send messages into and out of, instead of having a
bunch of `-Impl` types that are protected by mutexes.
Congestion control becomes a lot simpler with this refactor, since the
reactor can manage both stream- and circuit-level congestion control
unilaterally without having to share this information with consumers,
meaning we can get rid of some locks.
The way streams work also changes, in order to facilitate better
handling of backpressure / fairness between streams: each stream now has
a set of channels to send and receive messages over, instead of sending
relay cells directly onto the channel (now, the reactor pulls messages
off each stream in each map, and tries to avoid doing so if it won't be
able to forward them yet).
Additionally, a lot of "close this circuit / stream" messages aren't
required any more, since that state is simply indicated by one end of a
channel going away. This should make cleanup a lot less brittle.
Getting all of this to work involved writing a fair deal of intricate
nonblocking code in Reactor::run_once that tries very hard to be mindful
of making backpressure work correctly (and congestion control); the old
code could get away with having tasks .await on things, but the new
reactor can't really do this (as it'd lock the reactor up), so has to do
everything in a nonblocking manner.
@nickm pointed out that refactoring tor_proto::channel's Reactor to do
sending as well meant that it could only send or receive, but not both,
simultaneously, which was bad!
To fix this, rewrite Reactor::run_once to use a handcrafted future (with
futures::future::poll_fn) that can handle the logic required to push
items onto the sink asynchronously (i.e. checking that it can be written
to before trying to do that, and then flushing it).
This also means we don't use select_biased! any more, and just handroll
that logic ourselves; as a small bonus, we can now process all 3 kinds
of message in one run_once() call, instead of having to do only one of
them.
Instead of awkwardly sharing the internals of a `tor-proto` `Channel`
between the reactor task and any other tasks, move most of the internals
into the reactor and have other tasks communicate with the reactor via
message-passing to allocate circuits and send cells.
This makes a lot of things simple, and has convenient properties like
not needing to wrap the `Channel` in an `Arc` (though some places in the
code still do this for now).
A lot of test code required tweaking in order to deal with the refactor;
in fact, fixing the tests probably took longer than writing the mainline
code (!). Importantly, we now use `tokio`'s `tokio::test` annotation
instead of `async_test`, so that we can run things in the background
(which is required to have reactors running for the circuit tests).
This is an instance of #205, and also kind of #217.
We need this for the circuit timeout estimator (#57). It needs to
know "how recently have we got some incoming traffic", so that it
can tell whether a circuit has truly timed out, or whether the
entire network is down.
I'm implementing this with coarsetime, since we need to update these
in response to every single incoming cell, and we need the timestamp
operation to be _fast_.
(This reinstates an earlier commit, f30b2280, which I reverted
because we didn't need it at the time.)
Closes#179.
Basically the same thing as 371437d338
("Refactor tor_proto::channel::Reactor to use an UnboundedSender"), but
for tor_proto::circuit's Reactor instead.
(part of arti#217)
There wasn't any good reason for tor-proto's channel reactor to use a
shedload of oneshot channels instead of just an mpsc UnboundedSender,
and the whole `CtrlResult` thing made even less sense.
Straighten this code out by replacing all of that machinery with a
simple UnboundedSender, instead.
(part of arti#218)
Most of the structs in `arti-client` have example code now, to give a
clearer idea of how they're used.
Annoyingly, a lot of the types exposed in `arti-client` are actually
re-exports, which makes documentation a bit harder: example code that
references other parts of `arti-client` can't actually be run as a
doctest, since the crate it's in is a dependency of `arti-client`.
We might be able to fix this in future by doing the documentation in
`arti-client` itself, but rustdoc seems to have some weird behaviours
there that need to be investigated first (for example, it seems to merge
the re-export and original documentation, and also put the re-export
documentation on the `impl` block for some reason).
For now, though, this commit just writes the docs from the point of view
of an `arti-client` consumer, removing notes specific to the crate in
which they're defined. It's not ideal, but at least the end user
experience is decent.
This will let callers use the tokio traits on these types too, if
they call `split()` on the DataStream.
(Tokio also has a `tokio::io::split()` method, but it requires a
lock whereas `DataStream::split()` doesn't.)
futures::io::AsyncRead (and Write) isn't the same thing as tokio::io::AsyncRead,
which is a somewhat annoying misfeature of the Rust async ecosystem (!).
To mitigate this somewhat for people trying to use the `DataStream` struct with
tokio, implement the tokio versions of the above traits using `tokio-util`'s
compat layer, if a crate feature (`tokio`) is enabled.