This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
The docs even say this is about stream.
As @nickm writes in
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/252#note_2771289
we generally call end-to-end connections that are tunneled over Tor
"Streams" to distinguish them from everything else in the Tor
protocols that could possibly be called a "Connection".
That seems to apply here too.
This is totally not just an exercise to get combined test coverage
for tor-circmgr over 90% because I needed something to do that
wouldn't distract anybody else. :)
Since it implements a "<=" type relationship, it should be called
"at_least_as_permissive_as()." Since it's a crate-private function,
the long name isn't too bad.
I found these versions empirically, by using the following process:
First, I used `cargo tree --depth 1 --kind all` to get a list of
every immediate dependency we had.
Then, I used `cargo upgrade --workspace package@version` to change
each dependency to the earliest version with which (in theory) the
current version is semver-compatible. IOW, if the current version
was 3.2.3, I picked "3". If the current version was 0.12.8, I
picked "0.12".
Then, I used `cargo +nightly upgrade -Z minimal-versions` to
downgrade Cargo.lock to the minimal listed version for each
dependency. (I had to override a few packages; see .gitlab-ci.yml
for details).
Finally, I repeatedly increased the version of each of our
dependencies until our code compiled and the tests passed. Here's
what I found that we need:
anyhow >= 1.0.5: Earlier versions break our hyper example.
async-broadcast >= 0.3.2: Earlier versions fail our tests.
async-compression 0.3.5: Earlier versions handled futures and tokio
differently.
async-trait >= 0.1.2: Earlier versions are too buggy to compile our
code.
clap 2.33.0: For Arg::default_value_os().
coarsetime >= 0.1.20: exposed as_ticks() function.
curve25519-dalek >= 3.2: For is_identity().
generic-array 0.14.3: Earlier versions don't implement
From<&[T; 32]>
httparse >= 1.2: Earlier versions didn't implement Error.
itertools at 0.10.1: For at_most_once.
rusqlite >= 0.26.3: for backward compatibility with older rustc.
serde 1.0.103: Older versions break our code.
serde_json >= 1.0.50: Since we need its Value type to implement Eq.
shellexpand >= 2.1: To avoid a broken dirs crate version.
tokio >= 1.4: For Handle::block_on().
tracing >= 0.1.18: Previously, tracing_core and tracing had separate
LevelFilter types.
typenum >= 1.12: Compatibility with rust-crypto crates
x25519-dalek >= 1.2.0: For was_contributory().
Closes#275.
We are going to get rid of the Arc. Happily there is an id which is
always constructed uniquely and preserved by clone.
(auto-deref lets us make the function take &Self instead of &Arc)
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
This is a method, so the resolution is automatic. It's not clear to
me why this was written out this way, given that extend_ntor is right
above.
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
This closes arti#256. It makes our behavior match Tor's more closely,
though it has a simpler implementation than Tor. I think that the extra
complexity in Tor's logic is because we used to record timeouts in
the histogram as well as in the success/failure log.
The implementations of `Add` / `Sub` (et al.) on `std::time::Instant`
can panic if the underlying OS structure can't represent the result
(like arti#266). Use Instant::checked_add and print a warning instead,
to prevent panicking.
Also, we now add instead of subtracting; I suspect it's reasonable that
you might not be able to go backward past the first `Instant` created on
some platforms, but going *forward* should probably work?
IIUC, these anticipatd a need to store min_exit_circs_per_port in
CircMgr. But the current design, where it goes into preemptive.rs and
thence to usage, seems to work fine.
This required re-centralizing the configuration object for preemptive
circuits, since previously the settings from it were a bit spread out
over the crate.
And now the complexity begins: when the user changes the path_rules,
they not only want new circuits to obey those rules: they want
_all new requests_ to be put onto circuits that obey those rules.
That means that when the path rules become more restrictive, we need
to retire all the circuits, and make sure that currently pending
circuits aren't used for any requests.
If it's any comfort, doing this was even more complicated in C tor. ;)
This patch doesn't actually make anything reconfigurable, but it
does create an API that will tell you "you can't change the value of
that!" If the API looks reasonable, I can start making it possible
to change the values of individual items.
Previously we used Duration::mul_f64, which panics if its output is
out-of-range. That shouldn't actually be possible for the values
we're giving it, but probably it's better to just multiply in a safe
way.
This resolves a couple of XXXXs and therefore relates to #231.
We want to only use TODO in the codebase for non-blockers, and open
tickets for anything that is a bigger blocker than a TODO. These
XXXXs seem like definite non-blockers to me.
Part of arti#231.
Just as `in_same_family` is a member of Relay, so the function for
getting all the real family members of a relay should belong in the
same crate.
This change also removes the `family()` accessor: it gives the _claimed_ family rather
than the _acknlowedged_ family, and is therefore a bit dangerous.
There's still a hole in this logic; I've noted it in the Limitations
section. If we get a microdescriptor for a relay in between creating
and using the guard restriction, it might be omitted from the family
list.
There's not much reason to use a HashSet here, since we're just
going over the whole list.
This reverts commit 16e8489abb and does a little more
refactoring.
The new CircMgr::build_circuits_preemptively function actually causes
preemptive circuits to be built; it gets called from arti-client, like
the other daemon tasks the CircMgr has.
In preparation for making Arti build circuits preemptively, this commit
introduces `TargetCircUsage::Preemptive`, a circuit usage that works
somewhat differently from other ones: it requires at least 2 circuits to
exist that can exit the port it contains in order for an existing
circuit to match against it (path-spec.txt § 2.1.1); if that's not the
case, that usage will require building new circuits (in order that we
build enough to have 2 available).
This required refactoring how circuit reuse worked; now,
`CircList::find_open` uses the new `AbstractSpec::find_supported` trait
method, which we customize to implement the above check in the case of
`Preemptive` circuit usages. To make that work, `OpenEntry` now takes
two type parameters (the spec and circuit types), instead of taking a
builder type parameter and using its associated types. (We also got rid
of type constraints on that struct, yay!)
A WIP implementation of a preemptive circuit predictor that implements
path-spec.txt § 2.1.1 is also included, but this will require additional
effort to wire it up with the `CircMgr` properly.
This patch makes sure that for every* config type we have, the defaults
you get from a Builder match those you get from Serde, and that both
match the value that you get from arti_defaults.toml. Later down the
line I'll be adding some tests to keep these in sync.
* StorageConfig still has no defaults of its own, since we aren't so
sure we want other applications to use Arti's directories by default.
To do this at all neatly, I had to split out `tor-config` from
`arti-config` again, and putting the lower level stuff (paths,
builder errors) into tor-config. I also changed our use of
derive_builder to always use a common error type, to avoid
error type proliferation.
Rather like e8e9699c3c ("Get rid of
tor-proto's ChannelImpl, and use the reactor more instead"), this
admittedly rather large commit refactors the way circuits in `tor-proto`
work, centralising all of the logic in one large nonblocking reactor
which other things send messages into and out of, instead of having a
bunch of `-Impl` types that are protected by mutexes.
Congestion control becomes a lot simpler with this refactor, since the
reactor can manage both stream- and circuit-level congestion control
unilaterally without having to share this information with consumers,
meaning we can get rid of some locks.
The way streams work also changes, in order to facilitate better
handling of backpressure / fairness between streams: each stream now has
a set of channels to send and receive messages over, instead of sending
relay cells directly onto the channel (now, the reactor pulls messages
off each stream in each map, and tries to avoid doing so if it won't be
able to forward them yet).
Additionally, a lot of "close this circuit / stream" messages aren't
required any more, since that state is simply indicated by one end of a
channel going away. This should make cleanup a lot less brittle.
Getting all of this to work involved writing a fair deal of intricate
nonblocking code in Reactor::run_once that tries very hard to be mindful
of making backpressure work correctly (and congestion control); the old
code could get away with having tasks .await on things, but the new
reactor can't really do this (as it'd lock the reactor up), so has to do
everything in a nonblocking manner.
Instead of awkwardly sharing the internals of a `tor-proto` `Channel`
between the reactor task and any other tasks, move most of the internals
into the reactor and have other tasks communicate with the reactor via
message-passing to allocate circuits and send cells.
This makes a lot of things simple, and has convenient properties like
not needing to wrap the `Channel` in an `Arc` (though some places in the
code still do this for now).
A lot of test code required tweaking in order to deal with the refactor;
in fact, fixing the tests probably took longer than writing the mainline
code (!). Importantly, we now use `tokio`'s `tokio::test` annotation
instead of `async_test`, so that we can run things in the background
(which is required to have reactors running for the circuit tests).
This is an instance of #205, and also kind of #217.
Most of the structs in `arti-client` have example code now, to give a
clearer idea of how they're used.
Annoyingly, a lot of the types exposed in `arti-client` are actually
re-exports, which makes documentation a bit harder: example code that
references other parts of `arti-client` can't actually be run as a
doctest, since the crate it's in is a dependency of `arti-client`.
We might be able to fix this in future by doing the documentation in
`arti-client` itself, but rustdoc seems to have some weird behaviours
there that need to be investigated first (for example, it seems to merge
the re-export and original documentation, and also put the re-export
documentation on the `impl` block for some reason).
For now, though, this commit just writes the docs from the point of view
of an `arti-client` consumer, removing notes specific to the crate in
which they're defined. It's not ideal, but at least the end user
experience is decent.
Instead of putting a fully qualified name in the text, in most cases
we should just use the short name of the type or function we're
referring to.
In other words, instead of saying [`crate::module::Foo`], we should
typically say [`Foo`](crate::module::Foo).
The previous version of this test used the old, racy version of
wait_for (see #149). The new version is refactored so that
simulated time is only allowed to advance after each step is done,
so that we can actually be sure that each step in the process will
happen as it should.
In order to get the time-advances to proceed properly, and avoid
polluting state between tests, I've had to introduce some machinery
to encode the proper amount for time to advance. It isn't something
I'd want to use for a whole bunch of tests, but for just one set,
it's fine.
These tests now pass reliably for me.
I wonder if a discrete-event-simulation approach (hello, Shadow)
would let us write tests like these to our hearts' content?
As per arti#175, we'd like to be able to handle newer Arti versions
storing additional state in the persisted state files, without dropping
this data on the floor when we write out changes to these files.
Use the #[serde(flatten)] mechanism to achieve this, by adding catch-all
HashMap<String, JsonValue> fields to all structs that are at risk of
this happening to them.
There seems to be some issue here with the new WaitFor code,
where using the same MockSleepProvider with both of these wait_for()
calls gives questionable behavior under some circumstances (like
when running under Tarpaulin with the wrong set of flags).
We must not apply our new path-bias behavior (where we blame a guard
if it gives us too many indeterminate circuit failures) if the path
was not chosen at random. If too many random paths fail, we know
that's suspicious, since the other relays are a random sample. But
if a bunch of user-provided paths fail, that could simply be because
the user's chosen exit is down.
Instead of racily advancing time forward, this commit attempts to rework
how WaitFor works, such that it makes advances when all sleeper futures
that have been created have been polled (by handing the MockSleepRuntime
a Waker with which to wake up the WaitFor).
The above described mechanics work well enough for the double timeout
test, but fail in the presence of code that spawns asynchronous /
background tasks that must make progress before time is advanced for the
test to work properly. In order to deal with these cases, a set of APIs
are introduced in order to block time from being advanced until some
code has run, and a carveout added in order to permit small advances in
time where required.
(In some cases, code needed to be hacked up a bit in order to be made
properly testable using these APIs; the `MockablePlan` trait included in
here is somewhat unfortunate.)
This should fix arti#149.
Now that we have two kinds of isolation tokens (those set on a
stream, and those set by the stream's associated TorClient), we
need a more sophisticated kind of isolation.
This fixes the bug introduced with the previous commit, where
per-stream tokens would override per-TorClient tokens.
The three arguments TorClient::bootstrap requires by way of configuration
have been factored into a new TorClientConfig object.
This object gains two associated functions: one which uses `tor_config`'s
`CfgPath` machinery to generate sane defaults for the state and cache
directories, and one that accepts said directories in order to create a
config object with those inserted.
(this commit was inspired by trying to use arti as a library and being somewhat
overwhelmed by the amount of config stuff there was to do :p)
The previous code would report all failures to build a circuit as
failures of the guard. But of course that's not right: If we
fail to extend to the second or third hop, that might or might not
be the guard's fault.
Now we use the "pending status" feature of the GuardMonitor type so
that an early failure is attributed to the guard, but a later
failure is attributed as "Indeterminate". Only a complete circuit
is called a success. We use a new "GuardStatusHandle" type here so
that we can report the status early if there is a timeout.
(When we're building a path with a guard, we need to tell the guard
manager whether the path succeeded, and we need to wait to hear
whether the guard is usable.)
There are some missing parts here (like persistence and tests)
and some incorrect parts (I am 90% sure that the "exploratory
circuit" flag is bogus). Also it is not integrated with the circuit
manager code.
On torspec!40, Mike says:
I don't think there is a practical difference here. As per
Section 2.4.5, if 60 seconds is not enough and causes the
liveness test to fail due to too many timeouts, we will double
the initial timeout.
This makes our behavior the same as C tor.
The C Tor implementation doesn't do this, and Mike says:
I think it is a reasonable enough assumption that if Tor has
restarted, this kind data is no longer fresh enough to be
accurate for this purpose. This is also only 20 circuits here,
and typical timeouts are now around 1-2 seconds or less.. So a
restarted client with a timeout that is too low for a new
internet connection will figure this out pretty quickly. I think
that is OK.
(from torspec!40)