clippy::needless_borrow quibbles here, IMO correctly. Its suggestion
didn't go far enough: output is a String and a &String can be passed
to write as-is for identical effect.
I'm slightly concerned about whether this is behavior people would
expect to have on-by-default, so let's make this off-by-default for
now.
Maybe the `application` and `system` sections should merge?
This is by no means our final API, but should represent an
improvement. Here instead of having to specify a list of files and
their is-this-optional status, along with a list of command-line
options, we have a single structure that encapsulates all of that
information.
Two advantages here:
- Callers no longer have to remember what the boolean means.
- We can "reload" more easily, by keeping the source object around.
This change also implements the correct behavior for our default
configuration file in `arti::main`: if the file is absent and the
user doesn't list a config file, that's no problem. But if the user
lists _that very same config file, we should insist that it be
present.
This implements the proposal from arti#298, making the
`BenchmarkResults` type be made out of a bunch of new `Statistic` types
(which summarize the mean, median, range, and standard deviation of an
arbitrary value) instead of overloading `TimingSummary` for this
purpose.
There are a couple of places where we forgot to truncate our
return-buffer to its actual size, and instead returned a big bunch
of zeros. Found while writing the tests in the next commit.
Someday, we'll have ReadBuf and won't have to worry about these
things.
tor-netdir needs to bump because tor-netdoc bumped, even though
there were no other changes in tor-netdir. Whoops.
tor-guardmgr needs to bump because it already published, with the
older tor-netdir.
This failure occurred because our tests use canned data to exercise
the directory state functionality, and the canned consensus has
suddenly become very expired.
There are better fixes possible, but this is a minimal one that
should get CI working on main again.
This makes our layout more similar to our other crates, and
successfully informs our grcov exclusion pattern that these tests
are indeed tests.
Doing this knocks down the reported coverage for the tor-rtcompat
crate, but that's okay: we hadn't earned it.
I hereby promise that this commit is only code-movement.
This took some refactoring, so that I wouldn't need to define 9
different versions of the function. It also required that we change
the behavior of test_with_all_runtimes slightly, so that it asserts
on _any_ failure rather than asserting on most but returning Err()
for others. That in turn required changes to a few of its callers.
There's probably a better way to do all of this macro business, but
this is the best I could find.
This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
Having separate types here doesn't justify the (very limited)
benefit of distinguishing between the case where we have created an
executor that we own and the case where we have a handle to an
already-running tokio executor.
Part of #301.
This is based on @janimo's approach in !74, but diverges in a few
important ways.
1. It assumes that something like !251 will merge, so that we can
have separate implementations for native_tls and rustls compiled
at the same time.
2. It assumes that we can implement this for the futures::io traits
only with no real penalty.
3. It uses the `x509-signature` crate to work around the pickiness of
the `webpki` crate. If webpki eventually solves their
[bug 219](https://github.com/briansmith/webpki/issues/219), we
can remove a lot of that workaround.
Closes#86.
The docs even say this is about stream.
As @nickm writes in
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/252#note_2771289
we generally call end-to-end connections that are tunneled over Tor
"Streams" to distinguish them from everything else in the Tor
protocols that could possibly be called a "Connection".
That seems to apply here too.
This is totally not just an exercise to get combined test coverage
for tor-circmgr over 90% because I needed something to do that
wouldn't distract anybody else. :)
Since it implements a "<=" type relationship, it should be called
"at_least_as_permissive_as()." Since it's a crate-private function,
the long name isn't too bad.
In the usual case, set_isolation_group is awkward.
This is perhaps slightly duplicative with TorClient::isolated_client().
If so then perhaps the *latter* should be abolished.
We're soon going to have our different Runtime types be built as
CompoundRuntime instances. We don't want to expose that detail,
though, so we'll use this macro to make them implement the right
traits.
This type can solve two problems at once.
First, it lets users replace parts of an existing runtime
implementation without replacing the whole thing. For example, you
can use it to override your TcpProvider implementation to solve
problems like #235.
Second, we can use it internally to tor-rtcompat to define Runtimes
piece-by-piece. Mostly we'll use this to separate our Tls
implementations from our implementations of the rest of the Runtime.
This commit combines status update information from tor-dirmgr and
tor-chanmgr in the arti-client crate, so that the user can get to
it; it represents a high-level view of the client's ability to reach
the network and route traffic.
I have omitted the tor-circmgr support for now; it's mostly not
needed.
At present it's not so useful, since there's no way for a client to
get a TorClient that _isn't_ completely bootstrapped, and therefore
there's no way to actually watch these events until they're no
longer interesting. That should change with arti#293.
This is part of #96.
The information is pretty basic here: we use "have we been able to
connect/TLS-handshake/Tor-handshake" as a proxy for "are we on the
internet? Are we on a reasonably unfiltered part of the internet?"
Eventually we'll want to make the information gathered and exported
more detailed: I've noted a few places in the code. For now,
however, this is about as good as C Tor does today, and it should be
a good starting point.
This uses a slightly different design from tor-dirmgr. Instead of
exporting an entire state structure via `postage::watch`, it exports
only the parts of that structure which the user is supposed to
read. I think that's more reasonable in this case because most of
the possible internal transitions in the tor-chanmgr state don't
cause a change in the exposed status.
The interface is similar to the one exposed by `arti-client`: it
internally uses postage::watch to give a series of events showing
when a bootstrap status is changing.
Thanks to the existing state/driver separation in the DirMgr design
we don't need much new logic: each download state needs to expose
(internally) how far along it is in its download, which the
bootstrap code passes to the DirMgr if it has changed.
I believe that in the long run, we'll probably want to expose more
(or different) information here, and we'll want to process it
differently. With that in mind, I've made the API for
`DirBootstrapStatus` deliberately narrow, so that we can change its
of its internal later on without breaking code that depends on it.
(The information exposed by this commit is not yet summarized in
`arti-client`.)
Part of #96.
We now conduct benchmark tests with multiple concurrent streams (by
default; this is configurable by passing `-p` to `arti-bench`).
Currently, these results just get "flattened" for the purposes of
statistical analysis (as in, results_raw contains the results of each
connection's timing summary, across all benchmark runs). This might be
something we wish to change in future.
The stats summary now also records "best" and "worst" values for each
metric, to give a rough idea of the range of values encountered.
Additionally, we now support writing the benchmark results out to a JSON
file. A future commit may integrate this with CI, so that we have
benchmark results for every commit as a build artefact.
(some documentation was also fixed)
part of arti#292
We now do multiple samples (configurable; default 3) per type of
`arti-bench` benchmark run, and take a mean and median average of all
data collected, in order to hopefully be a bit more resilient to random
outliers / variation.
This uses some `futures::stream::Stream` hacks, which might result in
more connections being made than required (and might impact the TTFB
metrics somewhat, at least for downloading).
Results now get collected into a `BenchmarkResults` struct per type of
benchmark, which will be in turn placed into a `BenchmarkSummary` in a
later commit; this will also add the ability to serialize the latter
struct out to disk, for future reference.
part of arti#292
The purpose of a this API is to tell the user how far along Arti is
in getting bootstrapped, and if it's stuck, what it's stuck on.
This API doesn't yet expose any useful information: by the time it's
observable to a client, it's always "100% bootstrapped." But I'm
putting it in a MR now so that we can review the basic idea, and to
avoid conflicts with later work on tickets like #293 and #278.
This is part of #96.
This is a fine example of why booleans are risky:
it's far to easy to pass "animate:bool" into "inanimate:bool" like
we did here.
This is a followup from our fix to #294.
Previously we were requiring authenticated sendme cells exactly when we
should be permitting the old format, and vice versa.
This bug was caused by using a boolean to represent one property, but
with giving that boolean two different senses without inverting at the
right time.
The next commit will prevent a recurrence.
Closes#294
Previously we stored only one guard sample, in a state file called
"default_guards". That's not future-proof, since we want to have
multiple samples in the future. (`guard-spec.txt` specifies
separate samples for highly restrictive filters, and for bridge
usage.)
This patch changes our behavior so that we can store multiple
samples in a new "guards" file.
I had thought about automatically migrating from the previous file
format and location, but I don't think that's necessary given our
current (lack of) stability guarantees.
Closes#176.
This test should only fail very rarely (around 1/2.4e8) when guards
are chosen from a list of 20 with uniform probability. But that
wasn't what we were doing on the mock test network: we were choosing
from a list of 10 viable guards, with nonuniform probability.
As a fix, we change the test network probabilities so that the
guards _are_ chosen with a uniform probability for this test, and we
use a modified version of the test network where there are indeed 20
Guard-flagged relays with the required DirCache=2 protocol.
Closes#276.
This commit addresses multiple problems highlighted by arti#182:
- `arti-client` had some types in its public API that weren't accessible
without importing another crate (`CfgPath`, `DataReader`,
`DataWriter`). This has been fixed.
- In addition, the doc comments for `DataReader` and `DataWriter` were
cleaned up to be of better quality, now that they're public.
- It was impossible to use `arti-client` without also importing
`tor-rtcompat`. This is now fixed by the addition of two convenience
methods: `TorClient::bootstrap_with_tokio` and
`TorClient::bootstrap_with_async_std`.
- Potentially controversially: `tor-rtcompat` now returns *concrete*
types from methods like `current_runtime`, instead of `impl Runtime`.
- This was needed in order to actually be able to name the `TorClient`
type that results from using these methods.
- This does mean we lose API flexibility, but on balance I think this
is a good thing, because the API we *do* have is actually usable...
Previously we could only configure one global tracing filter that
applied to stdout and journald. There was no support for log files,
either.
This patch fixes both issues, by substantially revising the
configuration format: There are now separate filters for each log
file, for journald, and for the console log. Because we want to
allow multiple logfiles, they have to go into an array in the
configuration.
The configuration logic has grown a bit complicated in its types,
since the tracing_subscriber crate would prefer to have the complete
structure of tracing Layers known statically. That's fine when you
know how many you have, and which kinds there will be, but for
the runtime-configuration case we need to mess around with
`Box<dyn Layer ...>`.
I also had to switch from tracing_subscriber's EnvFilter to its
Targets filter. It seems "EnvFilter" can only be applied as a Layer
in itself, and won't work as a Filter on an individual Layer.
Closes#166.
Closes#170.
Previously we kept this in an ambiguously named type,
`ClientTimeoutConfig`. But everything we do right now is client
related! So `StreamTimeoutConfig` is a better name.
Also, we'd previously neglected to expose the builder for this type
from `TorClientConfigBuilder`. Now we do.
Closes#281.
I found these versions empirically, by using the following process:
First, I used `cargo tree --depth 1 --kind all` to get a list of
every immediate dependency we had.
Then, I used `cargo upgrade --workspace package@version` to change
each dependency to the earliest version with which (in theory) the
current version is semver-compatible. IOW, if the current version
was 3.2.3, I picked "3". If the current version was 0.12.8, I
picked "0.12".
Then, I used `cargo +nightly upgrade -Z minimal-versions` to
downgrade Cargo.lock to the minimal listed version for each
dependency. (I had to override a few packages; see .gitlab-ci.yml
for details).
Finally, I repeatedly increased the version of each of our
dependencies until our code compiled and the tests passed. Here's
what I found that we need:
anyhow >= 1.0.5: Earlier versions break our hyper example.
async-broadcast >= 0.3.2: Earlier versions fail our tests.
async-compression 0.3.5: Earlier versions handled futures and tokio
differently.
async-trait >= 0.1.2: Earlier versions are too buggy to compile our
code.
clap 2.33.0: For Arg::default_value_os().
coarsetime >= 0.1.20: exposed as_ticks() function.
curve25519-dalek >= 3.2: For is_identity().
generic-array 0.14.3: Earlier versions don't implement
From<&[T; 32]>
httparse >= 1.2: Earlier versions didn't implement Error.
itertools at 0.10.1: For at_most_once.
rusqlite >= 0.26.3: for backward compatibility with older rustc.
serde 1.0.103: Older versions break our code.
serde_json >= 1.0.50: Since we need its Value type to implement Eq.
shellexpand >= 2.1: To avoid a broken dirs crate version.
tokio >= 1.4: For Handle::block_on().
tracing >= 0.1.18: Previously, tracing_core and tracing had separate
LevelFilter types.
typenum >= 1.12: Compatibility with rust-crypto crates
x25519-dalek >= 1.2.0: For was_contributory().
Closes#275.
We are going to get rid of the Arc. Happily there is an id which is
always constructed uniquely and preserved by clone.
(auto-deref lets us make the function take &Self instead of &Arc)
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
When our current consensus is getting close to being invalid (but
it isn't invalid yet), we try to get a new one. So far, so good.
But we had a bug: when we went to get a new consensus, we'd see that we
had a perfectly fine not-yet-invalid consensus in our cache, reload it,
find that it was ready, and continue!
This patch fixes our behavior: If we have a usable consensus, then
when we reset the bootstrapping process, we ignore any cached consensus.
Fixes bug #274.
This is a method, so the resolution is automatic. It's not clear to
me why this was written out this way, given that extend_ntor is right
above.
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
This prevents a security-failure condition that could happen if our
directory caches don't give us these microdescriptors, but we
nevertheless decide that the directory is usable.
Closes#178
If we don't know a current microdescriptor for a guard, we can't use it
for multihop circuits, since we don't know its onion keys.
This is part of a fix for #178.
This closes arti#256. It makes our behavior match Tor's more closely,
though it has a simpler implementation than Tor. I think that the extra
complexity in Tor's logic is because we used to record timeouts in
the histogram as well as in the success/failure log.
The implementations of `Add` / `Sub` (et al.) on `std::time::Instant`
can panic if the underlying OS structure can't represent the result
(like arti#266). Use Instant::checked_add and print a warning instead,
to prevent panicking.
Also, we now add instead of subtracting; I suspect it's reasonable that
you might not be able to go backward past the first `Instant` created on
some platforms, but going *forward* should probably work?
IIUC, these anticipatd a need to store min_exit_circs_per_port in
CircMgr. But the current design, where it goes into preemptive.rs and
thence to usage, seems to work fine.
We _do_ reject bad hostnames: just not where I once thought we might.
We need to decide if the current behavior is what we want (and I think
it is, probably?).
Our behavior in handling not-yet-valid router descriptors doesn't match
the C Tor implementation, but it's not a big deal:
we don't currently use router descriptors at all.
If we fail to convert a curve25519 key to an ed25519 key for checking
the onion-key crosscert, don't call that an internal error: it means
that something is wrong with the provided ntor key.
Previously we'd always set it to true, allowing one CONNECTED per
half-closed stream even if the stream had already received a
CONNECTED cell.
This resolves an XXXX.
The new `arti-bench` crate does a simple end-to-end benchmark test
embedding Arti: it generates some random data (of configurable amount,
depending on command-line parameters), and then sends said data back and
forth via Arti (which should be configured to use a local Chutney
network).
Additionally, the benchmark can also be run via a local SOCKS5 server
(in order to benchmark the performance via a local Chutney node, for
comparison).
The `tests/chutney/arti-bench.sh` sets up and tears down Chutney as
required to make this work.
This is very much a first cut; there are many things that should
eventually get added, such as support for multiple connections, JSON
output capabilities, running multiple tests, ...
There's no known attack here, but it's best practice to always compare
digests using a constant-time comparison operator.
This resolves an XXXX comment.
This warning occurs if we ask for microdescriptors from our local
cache, and our cache gives us something we didn't ask for. It
shouldn't be possible, so let's warn when it occurs.
This patch resolves an XXXX.
It makes sense to put the method for human-readable strings onto the
type itself, so that we can format these whenever they occur.
I'm choosing the "human_str" method name here, since caret-generated
types already have a to_str. I was thinking about using Display,
but caret types already implement that.
I've also moved the message from "warn!" to "debug!", since these
aren't necessarily a problem condition.
This approach tries to preserve the current interface, but uses a
counter-based event backend to implement a coalescing stream of
events that can be represented as small integers. The advantage
here is that publishing events no longer needs to be a blocking
operation, since there is no queue to fill up.
(It's a protocol violation to get a SENDME when our send window is
already full.)
This patch makes SendWindow::put return a Result, so that it's
easier to do the right thing with it.
Closes#261.
arti!126 overhauled the `tor-proto` circuit reactor, but left out one
very important thing: actually decrementing the SENDME window for
streams (not circuits) when we send cells along them.
Since the circuit-level SENDME window would often prevent us from
running into a problem, this wasn't caught until my benchmarking efforts
noticed it (in the form of Tor nodes aborting the circuit for a protocol
violation).
fixes arti#260
We had no function to infallibly convert BoundedInt32<{0 or 1},H>
into a u32, even though we could have. Because of that, we were
treating weight_scale as an i32 when logically it's a u32 or a
NonZeroU32.
Moreover, it turns out we were using an incorrect minimum for the
bwweightscale param, which would in theory have allowed the
authorities to make us divide by zero.
This patch introduces the necessary From<> implementation and uses
it. It corrects the binimum bwweightscale, and prevents a
division-by-zero issue in case weight_scale is zero.
We don't want MutCfg to be automatially coneable, or we'll wind up with
surprises like the one that this patch fixes in TorClient.
(The "surprise" is that reconfigure() would only apply its
client-specific options to one client instance.)
If we allow overlapping reconfiguration requests, we introduce all
kinds of "fun" bugs. For example, we could wind up with a configuration
made up of parts of one reconfiguration attempt, and parts of another.
We can't change the authorities while in-flight: that would be pretty
miserable to implement.
Similarly we can't change the cache while in-flight.
Everything else should be fair game, though there are a couple of tricky
bits. I've tried to document those.
This required re-centralizing the configuration object for preemptive
circuits, since previously the settings from it were a bit spread out
over the crate.
And now the complexity begins: when the user changes the path_rules,
they not only want new circuits to obey those rules: they want
_all new requests_ to be put onto circuits that obey those rules.
That means that when the path rules become more restrictive, we need
to retire all the circuits, and make sure that currently pending
circuits aren't used for any requests.
If it's any comfort, doing this was even more complicated in C tor. ;)
It's useful to keep configuration objects inside a RwLock<Arc<>>, so we
can have slightly-stale pointers to the existing configuration structure
without holding locks too long.
This code adds a MutCfg type with basic support for this pattern,
and functions to make it a bit more ergonomic.
This patch doesn't actually make anything reconfigurable, but it
does create an API that will tell you "you can't change the value of
that!" If the API looks reasonable, I can start making it possible
to change the values of individual items.
`tor-rtcompat`'s `TlsConnector` trait previously included a method to
create a TLS-over-TCP connection, which implied creating a TCP stream
inside that method. This commit changes that, and makes the function
wrap a TCP stream, as returned from the runtime's `TcpProvider` trait
implementation, instead.
This means you can actually override `TcpProvider` and have it apply to
*all* connections Arti makes, which is useful for issues like arti#235
and other cases where you want to have a custom TCP stream
implementation.
This required updating the mock TCP/TLS types in `tor-rtmock` slightly;
due to the change in API, we now store whether a `LocalStream` should
actually be a TLS stream inside the stream itself, and check this
property on reads/writes in order to detect misuse. The fake TLS wrapper
checks this property and removes it in order to "wrap" the stream,
making reads and writes work again.
Previously we used Duration::mul_f64, which panics if its output is
out-of-range. That shouldn't actually be possible for the values
we're giving it, but probably it's better to just multiply in a safe
way.
This resolves a couple of XXXXs and therefore relates to #231.
The rand crate's documentation says it's not okay to rely on StdRng
having reproducible output. So instead, let's switch to ChaCha12Rng
instead (which is what StrRng currently uses).
We want to only use TODO in the codebase for non-blockers, and open
tickets for anything that is a bigger blocker than a TODO. These
XXXXs seem like definite non-blockers to me.
Part of arti#231.
Just as `in_same_family` is a member of Relay, so the function for
getting all the real family members of a relay should belong in the
same crate.
This change also removes the `family()` accessor: it gives the _claimed_ family rather
than the _acknlowedged_ family, and is therefore a bit dangerous.
There's still a hole in this logic; I've noted it in the Limitations
section. If we get a microdescriptor for a relay in between creating
and using the guard restriction, it might be omitted from the family
list.
There's not much reason to use a HashSet here, since we're just
going over the whole list.
This reverts commit 16e8489abb and does a little more
refactoring.
The redundant method was a `to_owned` that probably shouldn't have
been called that. It was only used in one place.
The tests should get tor-linkspec's line coverage up above 90%.
This was a relic of the old, now-unused "caret_enum!" macro.
Removing it gets caret's coverage to 100%.
Yes, technically this is a semver breaker on caret.
For this one I just wrote some "are things completely broken" tests
for the rand_compat wrappers. These won't detect subtle biases in
the RNGs! They'll only let you know if the wrappers have screwed up
in some way that always sets a given bit to 1 or 0.
This is mostly a finger exercise, and an experiment in "what does
grcov consider to be coverage". Here's what I've found out...
* In grcov's eyes, most #[derive(Foo)] lines count as containing code;
but calling any one derived function counts as calling those lines.
* Unlike with tarpaulin, it is actually possible to reach 100% grcov
line coverage. (Tarpaulin likes to pick "}" lines and tell you that
you never reached them; or sometimes it picks expression
statements that have the effect of a return, and tells you that
they're unreached. Even with these tests, tarpaulin claims that
the line coverage of tor-units is only 97.3%.)
* In rust, it may be a bit hopeless trying to get high function
coverage. Even though we've hit every line of the tor-units crate,
the function coverage from its own tests is only 9.38% (55.41%
from other crates). I think this is probably due to derived
functions, or maybe due to generics getting instantiated?
I've got no idea; the denominator for the function coverage
lines fluctuates oddly.
The sane_defaults() call is now the same as you get from a default
builder: by convention, we just call that method Default::default().
The with_directories() constructor makes more sense as a constructor
for the TorClientConfigBuilder than for TorClientConfig.
This test seems unreliable on CI: we've got to disable them for now
so that we have a working CI system. The CI failure is #238; the
ticket to repair them is #244.
The new CircMgr::build_circuits_preemptively function actually causes
preemptive circuits to be built; it gets called from arti-client, like
the other daemon tasks the CircMgr has.
This commit implements the "metabuilder" pattern and the "builder
reconstruction" pattern for the ArtiConfig type.
I'm not 100% that this will be necessary, but it will certainly help
with testing.
I'm still not 100% sure this is the right move: should we encourage
app developers to always pick their own directories? Or should we
make it easy for them to use, well, `sane_defaults`?
This patch takes the second approach.
Rust nightly claims that Vec might get its own retain_mut method,
which would potentially conflict with the extension method we've
grabbed from the retain_mut crate. To solve this, we're calling the
method explicitly.
In preparation for making Arti build circuits preemptively, this commit
introduces `TargetCircUsage::Preemptive`, a circuit usage that works
somewhat differently from other ones: it requires at least 2 circuits to
exist that can exit the port it contains in order for an existing
circuit to match against it (path-spec.txt § 2.1.1); if that's not the
case, that usage will require building new circuits (in order that we
build enough to have 2 available).
This required refactoring how circuit reuse worked; now,
`CircList::find_open` uses the new `AbstractSpec::find_supported` trait
method, which we customize to implement the above check in the case of
`Preemptive` circuit usages. To make that work, `OpenEntry` now takes
two type parameters (the spec and circuit types), instead of taking a
builder type parameter and using its associated types. (We also got rid
of type constraints on that struct, yay!)
A WIP implementation of a preemptive circuit predictor that implements
path-spec.txt § 2.1.1 is also included, but this will require additional
effort to wire it up with the `CircMgr` properly.
I traced the problem here to the fact that sometimes "rx" in this
test would be dropped before the test was done. When "rx" is
dropped, the channel reactor shuts down, which in turn kills off the
circuit reactor.
This bug may exist in other cases in these tests. This patch may
fix one case of #238.
This implements a basic typed event broadcast mechanism, as described in
arti#230: consumers of the new `tor-events` crate can emit `TorEvent`
events, which others can consume via the `TorEventReceiver`.
Under the hood, the crate uses the `async-broadcast`
(https://github.com/smol-rs/async-broadcast) crate, and a
`futures::mpsc::UnboundedSender` for the event emitters; these are glued
together in the `EventReactor`, which must be run in a background thread
for things to work. (This is done so event sending is always cheap and
non-blocking, since `async-broadcast` senders don't have this
functionality.)
Additionally, the `TorEventKind` type is used to implement selective
event reception / emission: receivers can subscribe to certain event
types (and in fact start out receiving nothing), which filters the set
of events they receive. Having no subscribers for a given event type
means it won't even be emitted in the first place, making things more
efficient.
This patch makes sure that for every* config type we have, the defaults
you get from a Builder match those you get from Serde, and that both
match the value that you get from arti_defaults.toml. Later down the
line I'll be adding some tests to keep these in sync.
* StorageConfig still has no defaults of its own, since we aren't so
sure we want other applications to use Arti's directories by default.
Since these shell-variables are hardwired to use org.torproject.Arti as
the program name, it isn't appropriate to call them "app-specific".
If we someday reinstate APP_FOO, it should be based on a user-provided
application name.
In order to handle explicitly specified path buffers directly, we now
let CfgPath be either a string (that gets expanded) or a PathBuf
(that doesn't).
This simplifies TorClientConfig::with_directories()
Now every section that the two configuration objects share has the
same type and name. This should help us in documenting our configuration
in a way that doesn't confuse people.
There is still lots of API work to go.
To do this at all neatly, I had to split out `tor-config` from
`arti-config` again, and putting the lower level stuff (paths,
builder errors) into tor-config. I also changed our use of
derive_builder to always use a common error type, to avoid
error type proliferation.
The `bad_extend_*` failures were caused by bad test code in
`bad_extend_test_impl` that used `futures::join!`; this meant that the
reactor could receive the `Extended2` cell before it actually got the
`ExtendNtor` request, which caused it to get (quite rightly) confused
and close the circuit. Spawning a background thread which has a short
delay before sending the `Extended2` cell seems to have alleviated this
problem.
`new_circ_create_failure` is similar; I think the reactor was getting
dropped before it had a chance to flush out its `CreateFast` cell
properly, because it had already gotten the result back (since the test
code sends it indiscriminately). This was "fixed" in much the same
manner as the other test: making it wait a bit before sending the result
cell back.
There seem to be other tests that use `futures::join!` (like
`begindir`?), and use similarly erroneous patterns; I haven't gotten any
to fail reliably enough to be able to debug them, though.
This field isn't used in modern Tor, and has never been used in
Arti. If tor!489 is merged, then it will no longer contain a useful
value in future consensuses. We shouldn't store it, or else
somebody else will get the smart idea of using it for something.
This commit breaks API compatibility for tor-netdoc with the
`build_docs` feature enabled. I haven't entered that into the
semver_status.md file, since we already have a pending tor-netdoc
API breaker in !129.
Nothing in arti currently uses this document type. Eventually it
will be useful for relays and for bridge clients.
I've left the "SHA1 digest of a router descriptor" type available
unconditinoally, however, since it does get used in a few places.
Part of #125.
We don't currently need a couple of the key manipulation features
that we have, since we aren't yet doing relays or onion service
clients.
Part of #125
Previously, the reactor would use an `UnboundedSender` to send things to
the `RawCellStream`, in order that the reactor wouldn't block if you
failed to read from the latter. This is bad, though, since it means
people can just run us out of memory by sending lots of things.
To fix this, we make the new `StreamReader` type (which does the reading
parts from `RawCellStream`) keep track of the stream's receive window
and issue SENDMEs once *it* has consumed enough data to require it, thus
meaning that we shouldn't get sent enough data to fill the channel
between reactor and `StreamReader` (and, if we do, that's someone trying
to flood us, and we abort the circuit).
As hinted to above, the `RawCellStream` was removed and its reading
functionalities replaced by `StreamReader`; its writing functionalities
are handled by `StreamTarget` anyway, so we just give out one of those
for the write end. This now means we don't need any mutexes!
note: this commit introduces a known issue, arti#230
Rather like e8e9699c3c ("Get rid of
tor-proto's ChannelImpl, and use the reactor more instead"), this
admittedly rather large commit refactors the way circuits in `tor-proto`
work, centralising all of the logic in one large nonblocking reactor
which other things send messages into and out of, instead of having a
bunch of `-Impl` types that are protected by mutexes.
Congestion control becomes a lot simpler with this refactor, since the
reactor can manage both stream- and circuit-level congestion control
unilaterally without having to share this information with consumers,
meaning we can get rid of some locks.
The way streams work also changes, in order to facilitate better
handling of backpressure / fairness between streams: each stream now has
a set of channels to send and receive messages over, instead of sending
relay cells directly onto the channel (now, the reactor pulls messages
off each stream in each map, and tries to avoid doing so if it won't be
able to forward them yet).
Additionally, a lot of "close this circuit / stream" messages aren't
required any more, since that state is simply indicated by one end of a
channel going away. This should make cleanup a lot less brittle.
Getting all of this to work involved writing a fair deal of intricate
nonblocking code in Reactor::run_once that tries very hard to be mindful
of making backpressure work correctly (and congestion control); the old
code could get away with having tasks .await on things, but the new
reactor can't really do this (as it'd lock the reactor up), so has to do
everything in a nonblocking manner.
It requires tracing-subscriber 0.2, which is a lower version than we
want, and which causes trouble with our minimal-versions CI test.
There is a pending issue to fix this; we can reinstate tracing-test
once it is merged: https://github.com/dbrgn/tracing-test/pull/11
This patch makes the rust-nightly CI task fail if it detects any
dbg!(), println!(), or eprintln!() calls in production code.
Because of clippy limitations, it may also gripe about calls to
these macros in our tests. The preferred workarounds are to either
instead. Both are acceptable.
We're doing this check in CI rather than unconditionally with clippy
directives, since we often want to have these calls in our code
temporarily while we're developing. Some day we might want this
test to go into a pre-push hook.
This patch also adds #![allow()] directives for println!() and
eprintln!() in the arti crate. Since that one isn't a library, it's
okay for it to speak to stdout/stderr.
Closes#218.
Previously it was either all-locked or all-not-locked. Now you can
simulate having the same shared storage opened by multiple managers,
only one of which has the lock.
@nickm pointed out that refactoring tor_proto::channel's Reactor to do
sending as well meant that it could only send or receive, but not both,
simultaneously, which was bad!
To fix this, rewrite Reactor::run_once to use a handcrafted future (with
futures::future::poll_fn) that can handle the logic required to push
items onto the sink asynchronously (i.e. checking that it can be written
to before trying to do that, and then flushing it).
This also means we don't use select_biased! any more, and just handroll
that logic ourselves; as a small bonus, we can now process all 3 kinds
of message in one run_once() call, instead of having to do only one of
them.
Instead of awkwardly sharing the internals of a `tor-proto` `Channel`
between the reactor task and any other tasks, move most of the internals
into the reactor and have other tasks communicate with the reactor via
message-passing to allocate circuits and send cells.
This makes a lot of things simple, and has convenient properties like
not needing to wrap the `Channel` in an `Arc` (though some places in the
code still do this for now).
A lot of test code required tweaking in order to deal with the refactor;
in fact, fixing the tests probably took longer than writing the mainline
code (!). Importantly, we now use `tokio`'s `tokio::test` annotation
instead of `async_test`, so that we can run things in the background
(which is required to have reactors running for the circuit tests).
This is an instance of #205, and also kind of #217.
We define "coming back online" as happening when a guard attempt
succeeds, if that attempt that was launched when we seemed to be
offline.
We define "seeming to be offline" as having all of our primary
guards marked unreachable, and having received no incoming network
traffic in a while.
Closes#216.
We need this for the circuit timeout estimator (#57). It needs to
know "how recently have we got some incoming traffic", so that it
can tell whether a circuit has truly timed out, or whether the
entire network is down.
I'm implementing this with coarsetime, since we need to update these
in response to every single incoming cell, and we need the timestamp
operation to be _fast_.
(This reinstates an earlier commit, f30b2280, which I reverted
because we didn't need it at the time.)
Closes#179.
This is based on @eta's patches for !118 and !119: Since we already
have an unbounded channel, we don't need to use an elaborate mess of
one-shot senders. We can just use the unbounded_send() method,
which also lets us enqueue a message without having to await.
Closes#219.
Basically the same thing as 371437d338
("Refactor tor_proto::channel::Reactor to use an UnboundedSender"), but
for tor_proto::circuit's Reactor instead.
(part of arti#217)
There wasn't any good reason for tor-proto's channel reactor to use a
shedload of oneshot channels instead of just an mpsc UnboundedSender,
and the whole `CtrlResult` thing made even less sense.
Straighten this code out by replacing all of that machinery with a
simple UnboundedSender, instead.
(part of arti#218)
Most of the structs in `arti-client` have example code now, to give a
clearer idea of how they're used.
Annoyingly, a lot of the types exposed in `arti-client` are actually
re-exports, which makes documentation a bit harder: example code that
references other parts of `arti-client` can't actually be run as a
doctest, since the crate it's in is a dependency of `arti-client`.
We might be able to fix this in future by doing the documentation in
`arti-client` itself, but rustdoc seems to have some weird behaviours
there that need to be investigated first (for example, it seems to merge
the re-export and original documentation, and also put the re-export
documentation on the `impl` block for some reason).
For now, though, this commit just writes the docs from the point of view
of an `arti-client` consumer, removing notes specific to the crate in
which they're defined. It's not ideal, but at least the end user
experience is decent.
Instead of putting a fully qualified name in the text, in most cases
we should just use the short name of the type or function we're
referring to.
In other words, instead of saying [`crate::module::Foo`], we should
typically say [`Foo`](crate::module::Foo).
This overhauls the top-level `arti-client` documentation significantly:
- the "Using arti-client" section walks the user through all of the
necessary steps to initiate a Torified TCP connection, and then
provides a code example
- this example is also available as `examples/readme.rs`; it's not run
as a doctest, since it involves connecting to Tor
- a "More advanced usage" subheading provides information about stream
isolation (and can potentially be used for other interesting
features once we get them).
- a new "Multiple runtime support" section was added to explain the
purpose and usage of the `tor-rtcompat` crate
- the section on design and privacy considerations was removed; this is
probably okay to keep in a README, but users of the crate aren't going
to be interested in this (at least I don't think)
(also, the doc comment for `arti_client::Error` was fixed to make actual
sense)
This test uses a consensus that I've copied from
tor-netdoc/testdata. I would include it directly, but I think that
will cause trouble when it comes time to run "cargo package".
The previous version of this test used the old, racy version of
wait_for (see #149). The new version is refactored so that
simulated time is only allowed to advance after each step is done,
so that we can actually be sure that each step in the process will
happen as it should.
In order to get the time-advances to proceed properly, and avoid
polluting state between tests, I've had to introduce some machinery
to encode the proper amount for time to advance. It isn't something
I'd want to use for a whole bunch of tests, but for just one set,
it's fine.
These tests now pass reliably for me.
I wonder if a discrete-event-simulation approach (hello, Shadow)
would let us write tests like these to our hearts' content?
This switches out `arti`'s argument-parsing library with `clap`, which
is a lot more featureful (and very widely used within the Rust
ecosystem). We also now use a lot of `clap`'s features to improve the
CLI experience:
- The CLI now expects a subcommand (currently, either "help", or "proxy"
for the existing SOCKS proxy behaviour). This should let us add
additional non-SOCKS-proxy features to arti in future.
- `clap` supports default values determined at runtime, so the way the
default config file is loaded was changed: now, we determine the
OS-specific path for said file before invoking `clap`, so the help
command can show it properly.
- The behaviour of `tor_config` was also changed; now, one simply
specifies a list of configuration files to load, together with
whether they're required.
- That function also way overused generics; this has been fixed.
- Instead of using the ARTI_LOG environment variable to configure
logging, one now uses the `-l, --log-level` CLI option.
(The intent is for this option to be more discoverable by users.)
- The `proxy` subcommand allows the user to override the SOCKS port used
on the CLI without editing the config file.
The Futureproof<T> type lets you serialize and deserialize types whose
representations might change (most useful for enums that might grow
additional variants). It uses #[serde(untagged)] to accomplish this.
This gets used in order to make the `disabled` field of `Guard` more
robust against future guard disablement reasons being added.
A test was also added to verify correct behaviour of the new type.
As per arti#175, we'd like to be able to handle newer Arti versions
storing additional state in the persisted state files, without dropping
this data on the floor when we write out changes to these files.
Use the #[serde(flatten)] mechanism to achieve this, by adding catch-all
HashMap<String, JsonValue> fields to all structs that are at risk of
this happening to them.
There seems to be some issue here with the new WaitFor code,
where using the same MockSleepProvider with both of these wait_for()
calls gives questionable behavior under some circumstances (like
when running under Tarpaulin with the wrong set of flags).
We must not apply our new path-bias behavior (where we blame a guard
if it gives us too many indeterminate circuit failures) if the path
was not chosen at random. If too many random paths fail, we know
that's suspicious, since the other relays are a random sample. But
if a bunch of user-provided paths fail, that could simply be because
the user's chosen exit is down.
We now track, for every guard: the total number of successful
circuits we've built through it, along with the total number of
"indeterminate" circuits.
Recall that a circuit's status is "indeterminate" if it has failed
for a reason that _might_ be the guard's fault, or might not be the
guard's fault. For example, if extending to the second hop of the
circuit fails, we have no way to know whether the guard deliberately
refused to connect there, or whether the second hop is just offline.
But we don't want to forgive all indeterminate circuit failures: if
we did, then a malicious guard could simply reject any second hops
that it didn't like, thereby filtering the client into a chosen
set of circuits.
As a stopgap solution, this patch now makes guards become
permanently disabled if the fraction of their circuit failures
becomes too high.
See also general-purpose path bias selection (arti#65), and Mike's
idea for changing the guard reachability definition (torspec#67).
This patch doesn't do either of those.
Closes#185.
Instead of racily advancing time forward, this commit attempts to rework
how WaitFor works, such that it makes advances when all sleeper futures
that have been created have been polled (by handing the MockSleepRuntime
a Waker with which to wake up the WaitFor).
The above described mechanics work well enough for the double timeout
test, but fail in the presence of code that spawns asynchronous /
background tasks that must make progress before time is advanced for the
test to work properly. In order to deal with these cases, a set of APIs
are introduced in order to block time from being advanced until some
code has run, and a carveout added in order to permit small advances in
time where required.
(In some cases, code needed to be hacked up a bit in order to be made
properly testable using these APIs; the `MockablePlan` trait included in
here is somewhat unfortunate.)
This should fix arti#149.
Now that we have two kinds of isolation tokens (those set on a
stream, and those set by the stream's associated TorClient), we
need a more sophisticated kind of isolation.
This fixes the bug introduced with the previous commit, where
per-stream tokens would override per-TorClient tokens.
The new `hyper` tor-client example demonstrates integrating arti with the
popular Rust `hyper` HTTP library by implementing a custom Hyper "connector"
(a type that can initiate connections to HTTP servers) that proxies said
connections via the Tor network.
This will let callers use the tokio traits on these types too, if
they call `split()` on the DataStream.
(Tokio also has a `tokio::io::split()` method, but it requires a
lock whereas `DataStream::split()` doesn't.)
futures::io::AsyncRead (and Write) isn't the same thing as tokio::io::AsyncRead,
which is a somewhat annoying misfeature of the Rust async ecosystem (!).
To mitigate this somewhat for people trying to use the `DataStream` struct with
tokio, implement the tokio versions of the above traits using `tokio-util`'s
compat layer, if a crate feature (`tokio`) is enabled.
The three arguments TorClient::bootstrap requires by way of configuration
have been factored into a new TorClientConfig object.
This object gains two associated functions: one which uses `tor_config`'s
`CfgPath` machinery to generate sane defaults for the state and cache
directories, and one that accepts said directories in order to create a
config object with those inserted.
(this commit was inspired by trying to use arti as a library and being somewhat
overwhelmed by the amount of config stuff there was to do :p)
Previously we'd say that we were "waiting for the other process to
bootstrap" even if it was already bootstrapped: and we wouldn't
actually declare success when it was done.
The `get_relay` function was confusing, since it would return None if
the relay was present, but wasn't actually a guard. We only used it
in one place, and in that one place we used it wrong, leading to a
panic bug.
Fixes#193.
For now, this avoids having to separately handle
AuthorityBuilderError, DirMgrConfigBuilderError, DownloadScheduleConfigBuilderError,
NetworkConfigBuilderError and FallbackDirBuilderError when anyhow is not
used.
Turn off a clippy warning.
The default soft limit is typically enough for process usage on most
Unixes, but OSX has a pretty low default (256), which you can run
into easily under heavy usage.
With this patch, we're going to aim for as much as 16384, if we're
allowed.
Fixes part of #188.
I don't love this approach, but those errors aren't distinguished by
ErrorKind, so we have to use libc or winapi, apparently. At least
nothing here is unsafe.
Addresses part of #188.
The previous code would report all failures to build a circuit as
failures of the guard. But of course that's not right: If we
fail to extend to the second or third hop, that might or might not
be the guard's fault.
Now we use the "pending status" feature of the GuardMonitor type so
that an early failure is attributed to the guard, but a later
failure is attributed as "Indeterminate". Only a complete circuit
is called a success. We use a new "GuardStatusHandle" type here so
that we can report the status early if there is a timeout.
(When we're building a path with a guard, we need to tell the guard
manager whether the path succeeded, and we need to wait to hear
whether the guard is usable.)
The advantage here is that we no longer have to use a futures-aware
Mutex, or a blocking send operation, and therefore can simplify a
bunch of the GuardMgr APIs to no longer be async. That'll avoid
having to propagate the asyncness up the stack.
The disadvantage is that unbounded channels are just that: nothing
in the channel prevents us from overfilling it. Fortunately, the
process that consumes from the channel shouldn't block much, and
the channel only gets filled when we're planning a circuit path.
I tried using -Z minimal-versions to downgrade all first-level
dependencies to their oldest permitted versions, and found that we
were apparently depending on newer features of all three crates.
I'm kind of surprised there were only three.
Also, refactor our message handling to be more like the tor_proto
reactors. The previous code had a bug where, once the stream of
events was exhausted, we wouldn't actually get any more
notifications.
There are some missing parts here (like persistence and tests)
and some incorrect parts (I am 90% sure that the "exploratory
circuit" flag is bogus). Also it is not integrated with the circuit
manager code.
The limitations with toml seemed to be reaching a head, and I wasn't
able to refactor the guardmgr code enough to actually have its state
be serializable as toml. Json's limitations are much narrower.
We'll need `id_pair_is_listed()` to track whether a sampled guard is
(or is not) listed in the consensus.
We'll need `missing_descriptor_for` to see whether we've downloaded
enough microdescs to use a consensus.
(Thank goodness for rust; we messed up the coherency in C here so
many times, but I'm pretty sure that this time around we can't have
gotten it wrong.)
Previously, we'd have to declare the field for a parameter in one
place, its default in a second, and its consensus key in a third.
That's error-prone and not so fun! This patch changes the
way we declare parameters so that we declare a structure once,
and macros expand it to all do the right thing.
This required a few new traits and implementations to ensure
uniformity across the types that can go in parameters: We need every
parameter type to implement TryFrom<i32> and to implement
SaturatingFromInt32.
Eventually we might want SaturatingFromInt32 to be a more generic
SaturatingFrom, but that's not for now.
Doing this makes the code faster, lets us throw away some code, and
makes it easier to add a "choose-N-disjoint relays" implementation.
See large comment about plusses and minuses of new code. (Note that
the old implementation wasn't constant-time either.)
On torspec!40, Mike says:
I don't think there is a practical difference here. As per
Section 2.4.5, if 60 seconds is not enough and causes the
liveness test to fail due to too many timeouts, we will double
the initial timeout.
This makes our behavior the same as C tor.
The C Tor implementation doesn't do this, and Mike says:
I think it is a reasonable enough assumption that if Tor has
restarted, this kind data is no longer fresh enough to be
accurate for this purpose. This is also only 20 circuits here,
and typical timeouts are now around 1-2 seconds or less.. So a
restarted client with a timeout that is too low for a new
internet connection will figure this out pretty quickly. I think
that is OK.
(from torspec!40)
Now that we have const generics, we can use them. We can also avoid
an extra clone in the implementation for [u8; N].
Nothing in our codebase requires that we use Reader or Writer on a
GenericArray holding anything other than u8, so I've switched back
to the more efficient implementation there.
I've added a fuzzer case for the new method, but apparently rustc nightly isn't working too
well with fuzzers for me; I'm going to try it tomorrow.