Since the only purpose of this function is to make sure that no
bootstrapping task is running, a simple futures:🔒:Mutex
should do the job just fine.
Closes#337.
This commit changes how the `TorClient` type works, enabling it to be
constructed synchronously without initiating the bootstrapping process.
Daemon tasks are still started on construction (although some of them
won't do anything if the client isn't bootstrapped).
The old bootstrap() methods are now reimplemented in terms of the new
create_unbootstrapped() and bootstrap_existing() methods.
This required refactoring how the `DirMgr` works to enable the same sort
of thing there.
closes#293
Refactor the Error type to remove the yucky internal hidden Truncated
variant. Instead, there's now an embedded tor_bytes::Error value.
If that tor_bytes::Error is Truncated, we bubble it up when we convert our
handshake result to the nested error struct.
Thus there is still (sadly) a variant of tor_socksproto::Error
that shouldn't be exposed to user code. But refactoring every
inner method under handshake.rs seemed like a bad idea: once we're using
Result<Result<..>>, the ? operator no longer helps us much.
This fixes a tiny race condition in the previous code, where we
checked whether an OptTimestamp is None a bit before we set it.
Since std::atomic gives us compare_exchange, we might as well use
it.
Instead of declaring a macro that takes vis as an argument, we now
conditionally declare a macro that applies an appropriate visibility.
There's a long comment explaining the rationale here, along with a
couple of other solutions that don't work.
This is closer to what we described in Errors.md.
Also, remove the (sometimes private) Result alias: it was only used in
one or two places, and never exposed in public.
This change lets us make TorError's members unconditionally hidden,
and makes our API a little more consistent (since basically nothing
else is a public field).
These tests turned up a need for using the #[track_caller]
annotation in order to get accurate locations, which is fortunately
stable since Rust 1.46.0.
At least by default, we should have Error be private, and not expose
it as part of our APIs.
To keep functionality in `arti`, I had to add an `ExitTimeout` error
kind.
For interface consistency, I also re-exported ErrorKind and HasError
from `arti_client`.
Fixes these messages:
warning: this URL is not a hyperlink
--> crates/arti/src/watch_cfg.rs:115:5
|
115 | /// https://github.com/notify-rs/notify/issues/165 and
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<https://github.com/notify-rs/notify/issues/165>`
|
= note: `#[warn(rustdoc::bare_urls)]` on by default
= note: bare URLs are not automatically turned into clickable links
warning: this URL is not a hyperlink
--> crates/arti/src/watch_cfg.rs:116:5
|
116 | /// https://github.com/notify-rs/notify/pull/166 .
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<https://github.com/notify-rs/notify/pull/166>`
|
= note: bare URLs are not automatically turned into clickable links
The motivation for doing this now is to remove the `#[from]` so we
would spot where operationsl circuit setup failures were handled.
(But it turns out that they are turned into internal errors!)
Perhaps this will want to become a different error type from circmgr
in due course, but for now we simply use a bespoke variant of
TorError.
It will want its own Kind. The TODO in the HasKind impl marks
this (amongst much else here).
This involves making a temporary ErrorKind::TODO. That will continue
to exist until all errors (at least, the ones that make it out to
here) can be properly categorised.
Introducing this will let us work from the top and bottom towards the
middle.
Provide an enum variant to contain the SpawnError and a From impl.
We use `#[from]` here because it doesn't really make sense to attach
any context, as it's not likely to be very relevant.
This needs two kinds. We have decided to treat a non-shutdown
SpawnError as "unexplained" rather than as an InternalError.
There are many crates whose
From<futures::task::SpawnError> for Error
erroneously treat it as an internal error. We will fix them in a moment.
Serialisation errors ought not to occur, since they would represent an
attempt to store malformed data, or something. (We always convert to
a string, so the JSON error never contains IO errors or the like.)
Deserialisation errors mean the persistent state is corrupt.
The type annotation may not be necessary for inference, but as a
comment it risks becoming false. So it should be uncommented, or
deleted.
Error types round here are not entirely trivial so uncomment it.
(More specifically, `notify` behaves differently on different
platforms. On some, it can watch specific directory objects on the
filesystem, and so it only notices when _those_ directories change.
If you change a symlink so that the canonical configuration file
location is now in some other directory, `notify` won't notice. But
on other platforms, notify just does "stat()" in a loop. On those,
it _will_ notice if the configuration file changes.)
A number of severe problems with the circuit reactor were fixed which
could cause reordering of cells (which causes relays to terminate the
circuit with a protocol violation, as they become unable to decrypt
them). These mostly revolve around improper usage of queues:
- The code assumed that a failure to place cells onto the channel would
persist for the duration of a reactor cycle run. However, under high
contention, this wouldn't always be the case.
- This leads to some cells getting enqueued while others go straight
through, before the enqueued cells.
- To fix this, we block sending cells out of the channel while there
are still some enqueued.
- The hop-specific queues queued after encryption, not before. This was
very brittle, and led to frequent mis-ordering.
- This was fixed by making them not do that.
This is arti!264 / 5bce9db562 without the
refactor part.
Since the user can put their logfiles and configuration files in the
same directory, writing to the log can trigger an event from
`notify`. If we log every non-interesting event from `notify`, then
we'll trigger the logs every time we log, and fill up the disk.
This commit removes the offending log and adds a comment about why.
If we someday decide we do need to log here, maybe we can rate-limit
the messages or something.
Due to limitations in notify and the OS APIs it uses, it isn't
actually so useful to watch a single file. Instead, we have to
watch the directories that contain the files, and filter out any
events that aren't about the specific files we care about.
I've put the logic here into a new type, but I've left the type
un-exported: its API is pretty ugly, inasmuch as the caller needs to
jump through hoops to only get the events that they want. That's
not too bad so long as the API is private, but we'd want better if
we were exposing this.
Prompted by clippy::needless_question_mark. Sometimes Ok(r?) is
needed to do automatic error conversion. I assume the lint checks for
that. Anyway, in these cases it's not needed.
clippy::needless_borrow quibbles here, IMO correctly. Its suggestion
didn't go far enough: output is a String and a &String can be passed
to write as-is for identical effect.
I'm slightly concerned about whether this is behavior people would
expect to have on-by-default, so let's make this off-by-default for
now.
Maybe the `application` and `system` sections should merge?
This is by no means our final API, but should represent an
improvement. Here instead of having to specify a list of files and
their is-this-optional status, along with a list of command-line
options, we have a single structure that encapsulates all of that
information.
Two advantages here:
- Callers no longer have to remember what the boolean means.
- We can "reload" more easily, by keeping the source object around.
This change also implements the correct behavior for our default
configuration file in `arti::main`: if the file is absent and the
user doesn't list a config file, that's no problem. But if the user
lists _that very same config file, we should insist that it be
present.
This implements the proposal from arti#298, making the
`BenchmarkResults` type be made out of a bunch of new `Statistic` types
(which summarize the mean, median, range, and standard deviation of an
arbitrary value) instead of overloading `TimingSummary` for this
purpose.
There are a couple of places where we forgot to truncate our
return-buffer to its actual size, and instead returned a big bunch
of zeros. Found while writing the tests in the next commit.
Someday, we'll have ReadBuf and won't have to worry about these
things.
tor-netdir needs to bump because tor-netdoc bumped, even though
there were no other changes in tor-netdir. Whoops.
tor-guardmgr needs to bump because it already published, with the
older tor-netdir.
This failure occurred because our tests use canned data to exercise
the directory state functionality, and the canned consensus has
suddenly become very expired.
There are better fixes possible, but this is a minimal one that
should get CI working on main again.
This makes our layout more similar to our other crates, and
successfully informs our grcov exclusion pattern that these tests
are indeed tests.
Doing this knocks down the reported coverage for the tor-rtcompat
crate, but that's okay: we hadn't earned it.
I hereby promise that this commit is only code-movement.
This took some refactoring, so that I wouldn't need to define 9
different versions of the function. It also required that we change
the behavior of test_with_all_runtimes slightly, so that it asserts
on _any_ failure rather than asserting on most but returning Err()
for others. That in turn required changes to a few of its callers.
There's probably a better way to do all of this macro business, but
this is the best I could find.
This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
Having separate types here doesn't justify the (very limited)
benefit of distinguishing between the case where we have created an
executor that we own and the case where we have a handle to an
already-running tokio executor.
Part of #301.
This is based on @janimo's approach in !74, but diverges in a few
important ways.
1. It assumes that something like !251 will merge, so that we can
have separate implementations for native_tls and rustls compiled
at the same time.
2. It assumes that we can implement this for the futures::io traits
only with no real penalty.
3. It uses the `x509-signature` crate to work around the pickiness of
the `webpki` crate. If webpki eventually solves their
[bug 219](https://github.com/briansmith/webpki/issues/219), we
can remove a lot of that workaround.
Closes#86.
The docs even say this is about stream.
As @nickm writes in
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/252#note_2771289
we generally call end-to-end connections that are tunneled over Tor
"Streams" to distinguish them from everything else in the Tor
protocols that could possibly be called a "Connection".
That seems to apply here too.
This is totally not just an exercise to get combined test coverage
for tor-circmgr over 90% because I needed something to do that
wouldn't distract anybody else. :)
Since it implements a "<=" type relationship, it should be called
"at_least_as_permissive_as()." Since it's a crate-private function,
the long name isn't too bad.
In the usual case, set_isolation_group is awkward.
This is perhaps slightly duplicative with TorClient::isolated_client().
If so then perhaps the *latter* should be abolished.
We're soon going to have our different Runtime types be built as
CompoundRuntime instances. We don't want to expose that detail,
though, so we'll use this macro to make them implement the right
traits.
This type can solve two problems at once.
First, it lets users replace parts of an existing runtime
implementation without replacing the whole thing. For example, you
can use it to override your TcpProvider implementation to solve
problems like #235.
Second, we can use it internally to tor-rtcompat to define Runtimes
piece-by-piece. Mostly we'll use this to separate our Tls
implementations from our implementations of the rest of the Runtime.
This commit combines status update information from tor-dirmgr and
tor-chanmgr in the arti-client crate, so that the user can get to
it; it represents a high-level view of the client's ability to reach
the network and route traffic.
I have omitted the tor-circmgr support for now; it's mostly not
needed.
At present it's not so useful, since there's no way for a client to
get a TorClient that _isn't_ completely bootstrapped, and therefore
there's no way to actually watch these events until they're no
longer interesting. That should change with arti#293.
This is part of #96.
The information is pretty basic here: we use "have we been able to
connect/TLS-handshake/Tor-handshake" as a proxy for "are we on the
internet? Are we on a reasonably unfiltered part of the internet?"
Eventually we'll want to make the information gathered and exported
more detailed: I've noted a few places in the code. For now,
however, this is about as good as C Tor does today, and it should be
a good starting point.
This uses a slightly different design from tor-dirmgr. Instead of
exporting an entire state structure via `postage::watch`, it exports
only the parts of that structure which the user is supposed to
read. I think that's more reasonable in this case because most of
the possible internal transitions in the tor-chanmgr state don't
cause a change in the exposed status.
The interface is similar to the one exposed by `arti-client`: it
internally uses postage::watch to give a series of events showing
when a bootstrap status is changing.
Thanks to the existing state/driver separation in the DirMgr design
we don't need much new logic: each download state needs to expose
(internally) how far along it is in its download, which the
bootstrap code passes to the DirMgr if it has changed.
I believe that in the long run, we'll probably want to expose more
(or different) information here, and we'll want to process it
differently. With that in mind, I've made the API for
`DirBootstrapStatus` deliberately narrow, so that we can change its
of its internal later on without breaking code that depends on it.
(The information exposed by this commit is not yet summarized in
`arti-client`.)
Part of #96.
We now conduct benchmark tests with multiple concurrent streams (by
default; this is configurable by passing `-p` to `arti-bench`).
Currently, these results just get "flattened" for the purposes of
statistical analysis (as in, results_raw contains the results of each
connection's timing summary, across all benchmark runs). This might be
something we wish to change in future.
The stats summary now also records "best" and "worst" values for each
metric, to give a rough idea of the range of values encountered.
Additionally, we now support writing the benchmark results out to a JSON
file. A future commit may integrate this with CI, so that we have
benchmark results for every commit as a build artefact.
(some documentation was also fixed)
part of arti#292
We now do multiple samples (configurable; default 3) per type of
`arti-bench` benchmark run, and take a mean and median average of all
data collected, in order to hopefully be a bit more resilient to random
outliers / variation.
This uses some `futures::stream::Stream` hacks, which might result in
more connections being made than required (and might impact the TTFB
metrics somewhat, at least for downloading).
Results now get collected into a `BenchmarkResults` struct per type of
benchmark, which will be in turn placed into a `BenchmarkSummary` in a
later commit; this will also add the ability to serialize the latter
struct out to disk, for future reference.
part of arti#292
The purpose of a this API is to tell the user how far along Arti is
in getting bootstrapped, and if it's stuck, what it's stuck on.
This API doesn't yet expose any useful information: by the time it's
observable to a client, it's always "100% bootstrapped." But I'm
putting it in a MR now so that we can review the basic idea, and to
avoid conflicts with later work on tickets like #293 and #278.
This is part of #96.
This is a fine example of why booleans are risky:
it's far to easy to pass "animate:bool" into "inanimate:bool" like
we did here.
This is a followup from our fix to #294.
Previously we were requiring authenticated sendme cells exactly when we
should be permitting the old format, and vice versa.
This bug was caused by using a boolean to represent one property, but
with giving that boolean two different senses without inverting at the
right time.
The next commit will prevent a recurrence.
Closes#294
Previously we stored only one guard sample, in a state file called
"default_guards". That's not future-proof, since we want to have
multiple samples in the future. (`guard-spec.txt` specifies
separate samples for highly restrictive filters, and for bridge
usage.)
This patch changes our behavior so that we can store multiple
samples in a new "guards" file.
I had thought about automatically migrating from the previous file
format and location, but I don't think that's necessary given our
current (lack of) stability guarantees.
Closes#176.
This test should only fail very rarely (around 1/2.4e8) when guards
are chosen from a list of 20 with uniform probability. But that
wasn't what we were doing on the mock test network: we were choosing
from a list of 10 viable guards, with nonuniform probability.
As a fix, we change the test network probabilities so that the
guards _are_ chosen with a uniform probability for this test, and we
use a modified version of the test network where there are indeed 20
Guard-flagged relays with the required DirCache=2 protocol.
Closes#276.
This commit addresses multiple problems highlighted by arti#182:
- `arti-client` had some types in its public API that weren't accessible
without importing another crate (`CfgPath`, `DataReader`,
`DataWriter`). This has been fixed.
- In addition, the doc comments for `DataReader` and `DataWriter` were
cleaned up to be of better quality, now that they're public.
- It was impossible to use `arti-client` without also importing
`tor-rtcompat`. This is now fixed by the addition of two convenience
methods: `TorClient::bootstrap_with_tokio` and
`TorClient::bootstrap_with_async_std`.
- Potentially controversially: `tor-rtcompat` now returns *concrete*
types from methods like `current_runtime`, instead of `impl Runtime`.
- This was needed in order to actually be able to name the `TorClient`
type that results from using these methods.
- This does mean we lose API flexibility, but on balance I think this
is a good thing, because the API we *do* have is actually usable...
Previously we could only configure one global tracing filter that
applied to stdout and journald. There was no support for log files,
either.
This patch fixes both issues, by substantially revising the
configuration format: There are now separate filters for each log
file, for journald, and for the console log. Because we want to
allow multiple logfiles, they have to go into an array in the
configuration.
The configuration logic has grown a bit complicated in its types,
since the tracing_subscriber crate would prefer to have the complete
structure of tracing Layers known statically. That's fine when you
know how many you have, and which kinds there will be, but for
the runtime-configuration case we need to mess around with
`Box<dyn Layer ...>`.
I also had to switch from tracing_subscriber's EnvFilter to its
Targets filter. It seems "EnvFilter" can only be applied as a Layer
in itself, and won't work as a Filter on an individual Layer.
Closes#166.
Closes#170.
Previously we kept this in an ambiguously named type,
`ClientTimeoutConfig`. But everything we do right now is client
related! So `StreamTimeoutConfig` is a better name.
Also, we'd previously neglected to expose the builder for this type
from `TorClientConfigBuilder`. Now we do.
Closes#281.
I found these versions empirically, by using the following process:
First, I used `cargo tree --depth 1 --kind all` to get a list of
every immediate dependency we had.
Then, I used `cargo upgrade --workspace package@version` to change
each dependency to the earliest version with which (in theory) the
current version is semver-compatible. IOW, if the current version
was 3.2.3, I picked "3". If the current version was 0.12.8, I
picked "0.12".
Then, I used `cargo +nightly upgrade -Z minimal-versions` to
downgrade Cargo.lock to the minimal listed version for each
dependency. (I had to override a few packages; see .gitlab-ci.yml
for details).
Finally, I repeatedly increased the version of each of our
dependencies until our code compiled and the tests passed. Here's
what I found that we need:
anyhow >= 1.0.5: Earlier versions break our hyper example.
async-broadcast >= 0.3.2: Earlier versions fail our tests.
async-compression 0.3.5: Earlier versions handled futures and tokio
differently.
async-trait >= 0.1.2: Earlier versions are too buggy to compile our
code.
clap 2.33.0: For Arg::default_value_os().
coarsetime >= 0.1.20: exposed as_ticks() function.
curve25519-dalek >= 3.2: For is_identity().
generic-array 0.14.3: Earlier versions don't implement
From<&[T; 32]>
httparse >= 1.2: Earlier versions didn't implement Error.
itertools at 0.10.1: For at_most_once.
rusqlite >= 0.26.3: for backward compatibility with older rustc.
serde 1.0.103: Older versions break our code.
serde_json >= 1.0.50: Since we need its Value type to implement Eq.
shellexpand >= 2.1: To avoid a broken dirs crate version.
tokio >= 1.4: For Handle::block_on().
tracing >= 0.1.18: Previously, tracing_core and tracing had separate
LevelFilter types.
typenum >= 1.12: Compatibility with rust-crypto crates
x25519-dalek >= 1.2.0: For was_contributory().
Closes#275.
We are going to get rid of the Arc. Happily there is an id which is
always constructed uniquely and preserved by clone.
(auto-deref lets us make the function take &Self instead of &Arc)
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
When our current consensus is getting close to being invalid (but
it isn't invalid yet), we try to get a new one. So far, so good.
But we had a bug: when we went to get a new consensus, we'd see that we
had a perfectly fine not-yet-invalid consensus in our cache, reload it,
find that it was ready, and continue!
This patch fixes our behavior: If we have a usable consensus, then
when we reset the bootstrapping process, we ignore any cached consensus.
Fixes bug #274.
This is a method, so the resolution is automatic. It's not clear to
me why this was written out this way, given that extend_ntor is right
above.
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
This prevents a security-failure condition that could happen if our
directory caches don't give us these microdescriptors, but we
nevertheless decide that the directory is usable.
Closes#178
If we don't know a current microdescriptor for a guard, we can't use it
for multihop circuits, since we don't know its onion keys.
This is part of a fix for #178.
This closes arti#256. It makes our behavior match Tor's more closely,
though it has a simpler implementation than Tor. I think that the extra
complexity in Tor's logic is because we used to record timeouts in
the histogram as well as in the success/failure log.
The implementations of `Add` / `Sub` (et al.) on `std::time::Instant`
can panic if the underlying OS structure can't represent the result
(like arti#266). Use Instant::checked_add and print a warning instead,
to prevent panicking.
Also, we now add instead of subtracting; I suspect it's reasonable that
you might not be able to go backward past the first `Instant` created on
some platforms, but going *forward* should probably work?
IIUC, these anticipatd a need to store min_exit_circs_per_port in
CircMgr. But the current design, where it goes into preemptive.rs and
thence to usage, seems to work fine.
We _do_ reject bad hostnames: just not where I once thought we might.
We need to decide if the current behavior is what we want (and I think
it is, probably?).
Our behavior in handling not-yet-valid router descriptors doesn't match
the C Tor implementation, but it's not a big deal:
we don't currently use router descriptors at all.
If we fail to convert a curve25519 key to an ed25519 key for checking
the onion-key crosscert, don't call that an internal error: it means
that something is wrong with the provided ntor key.
Previously we'd always set it to true, allowing one CONNECTED per
half-closed stream even if the stream had already received a
CONNECTED cell.
This resolves an XXXX.
The new `arti-bench` crate does a simple end-to-end benchmark test
embedding Arti: it generates some random data (of configurable amount,
depending on command-line parameters), and then sends said data back and
forth via Arti (which should be configured to use a local Chutney
network).
Additionally, the benchmark can also be run via a local SOCKS5 server
(in order to benchmark the performance via a local Chutney node, for
comparison).
The `tests/chutney/arti-bench.sh` sets up and tears down Chutney as
required to make this work.
This is very much a first cut; there are many things that should
eventually get added, such as support for multiple connections, JSON
output capabilities, running multiple tests, ...
There's no known attack here, but it's best practice to always compare
digests using a constant-time comparison operator.
This resolves an XXXX comment.
This warning occurs if we ask for microdescriptors from our local
cache, and our cache gives us something we didn't ask for. It
shouldn't be possible, so let's warn when it occurs.
This patch resolves an XXXX.
It makes sense to put the method for human-readable strings onto the
type itself, so that we can format these whenever they occur.
I'm choosing the "human_str" method name here, since caret-generated
types already have a to_str. I was thinking about using Display,
but caret types already implement that.
I've also moved the message from "warn!" to "debug!", since these
aren't necessarily a problem condition.
This approach tries to preserve the current interface, but uses a
counter-based event backend to implement a coalescing stream of
events that can be represented as small integers. The advantage
here is that publishing events no longer needs to be a blocking
operation, since there is no queue to fill up.
(It's a protocol violation to get a SENDME when our send window is
already full.)
This patch makes SendWindow::put return a Result, so that it's
easier to do the right thing with it.
Closes#261.
arti!126 overhauled the `tor-proto` circuit reactor, but left out one
very important thing: actually decrementing the SENDME window for
streams (not circuits) when we send cells along them.
Since the circuit-level SENDME window would often prevent us from
running into a problem, this wasn't caught until my benchmarking efforts
noticed it (in the form of Tor nodes aborting the circuit for a protocol
violation).
fixes arti#260