From its old name, this error had implied that we were giving no
useful information when we were waiting on a pending cirucit request
that failed. In fact, this error would only happen if we dropped the
`mpsc::Sender` for a circuit attempt without reporting success or
failure.
These errors should almost never be seen by the user; we should instead
retry the circuit. But they _can_ be seen by the use if selecting a
guard takes too long, or too many attempts. (Therefore, they aren't true
"internal" errors.)
I suspect that we might not want to keep this TransientFailure kind, but
I'm not sure what else to do here for now.
The new `BootstrapBehavior` enum controls whether an unbootstrapped
`TorClient` will bootstrap itself automatically (`Ondemand`) when an
attempt is made to use it, or whether the user must perform
bootstrapping themselves (`Manual`).
The `lazy-init` example shows how you could write a simple
`get_tor_client()` function that used a global `OnceCell` to share
a Tor client across an entire application with this API.
closes arti#278
In one of the two places, nightly no longer warns. In the other
place, it's fine for nightly to warn: I just fixed the code to take
a slice instead.
Partial revert of 856aca8791.
Resolves part of #310.
This change is possible now that #293 is done.
As an immediate benefit, it allows us to start monitoring the
configuration files immediately, and not only after we're done
bootstrapping the client.
Closes#336.
Additionally, refactor the IoError out of tor_cell::Error:
nothing in TorCell created this; it was only used by tor_proto.
This required refactoring in tor_proto to use a new error type. Here I
decided to use a new CodecError for now, though we may refactor that
away soon too.
Adding this autoconversion is quite safe since every error generation
site is explicit and has its own context, and we don't really need to
add more.
This simplifies the code and will simplify future work.
This error type doesn't impement HasKind, since the kind will depend
on context.
However, the existing implementation was pretty messy and inconsistent:
Some errors had positions, some didn't.
Some took messages as str, some as String.
Some had internal errors that were somewhat orthogonal to their actual
types.
This commit refactors tor_netdoc::Error to use a ParseErrorKind, and
adds a set of convenience functions to add positions and
messages to the errors that need them.
We should not generally explicitly specify a kind for errors which
contain a more detailed error which itself has a kind. Stating the
kind literally is a latent bug, which becomes a real bug if the
contained type's kind changes or starts to vary.
(There may be exceptions to this principle but this isn't one of
them.)
Since the only purpose of this function is to make sure that no
bootstrapping task is running, a simple futures:🔒:Mutex
should do the job just fine.
Closes#337.
This commit changes how the `TorClient` type works, enabling it to be
constructed synchronously without initiating the bootstrapping process.
Daemon tasks are still started on construction (although some of them
won't do anything if the client isn't bootstrapped).
The old bootstrap() methods are now reimplemented in terms of the new
create_unbootstrapped() and bootstrap_existing() methods.
This required refactoring how the `DirMgr` works to enable the same sort
of thing there.
closes#293
Refactor the Error type to remove the yucky internal hidden Truncated
variant. Instead, there's now an embedded tor_bytes::Error value.
If that tor_bytes::Error is Truncated, we bubble it up when we convert our
handshake result to the nested error struct.
Thus there is still (sadly) a variant of tor_socksproto::Error
that shouldn't be exposed to user code. But refactoring every
inner method under handshake.rs seemed like a bad idea: once we're using
Result<Result<..>>, the ? operator no longer helps us much.
This fixes a tiny race condition in the previous code, where we
checked whether an OptTimestamp is None a bit before we set it.
Since std::atomic gives us compare_exchange, we might as well use
it.
Instead of declaring a macro that takes vis as an argument, we now
conditionally declare a macro that applies an appropriate visibility.
There's a long comment explaining the rationale here, along with a
couple of other solutions that don't work.
This is closer to what we described in Errors.md.
Also, remove the (sometimes private) Result alias: it was only used in
one or two places, and never exposed in public.
This change lets us make TorError's members unconditionally hidden,
and makes our API a little more consistent (since basically nothing
else is a public field).
These tests turned up a need for using the #[track_caller]
annotation in order to get accurate locations, which is fortunately
stable since Rust 1.46.0.
At least by default, we should have Error be private, and not expose
it as part of our APIs.
To keep functionality in `arti`, I had to add an `ExitTimeout` error
kind.
For interface consistency, I also re-exported ErrorKind and HasError
from `arti_client`.
Fixes these messages:
warning: this URL is not a hyperlink
--> crates/arti/src/watch_cfg.rs:115:5
|
115 | /// https://github.com/notify-rs/notify/issues/165 and
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<https://github.com/notify-rs/notify/issues/165>`
|
= note: `#[warn(rustdoc::bare_urls)]` on by default
= note: bare URLs are not automatically turned into clickable links
warning: this URL is not a hyperlink
--> crates/arti/src/watch_cfg.rs:116:5
|
116 | /// https://github.com/notify-rs/notify/pull/166 .
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use an automatic link instead: `<https://github.com/notify-rs/notify/pull/166>`
|
= note: bare URLs are not automatically turned into clickable links
The motivation for doing this now is to remove the `#[from]` so we
would spot where operationsl circuit setup failures were handled.
(But it turns out that they are turned into internal errors!)
Perhaps this will want to become a different error type from circmgr
in due course, but for now we simply use a bespoke variant of
TorError.
It will want its own Kind. The TODO in the HasKind impl marks
this (amongst much else here).
This involves making a temporary ErrorKind::TODO. That will continue
to exist until all errors (at least, the ones that make it out to
here) can be properly categorised.
Introducing this will let us work from the top and bottom towards the
middle.
Provide an enum variant to contain the SpawnError and a From impl.
We use `#[from]` here because it doesn't really make sense to attach
any context, as it's not likely to be very relevant.
This needs two kinds. We have decided to treat a non-shutdown
SpawnError as "unexplained" rather than as an InternalError.
There are many crates whose
From<futures::task::SpawnError> for Error
erroneously treat it as an internal error. We will fix them in a moment.
Serialisation errors ought not to occur, since they would represent an
attempt to store malformed data, or something. (We always convert to
a string, so the JSON error never contains IO errors or the like.)
Deserialisation errors mean the persistent state is corrupt.
The type annotation may not be necessary for inference, but as a
comment it risks becoming false. So it should be uncommented, or
deleted.
Error types round here are not entirely trivial so uncomment it.
(More specifically, `notify` behaves differently on different
platforms. On some, it can watch specific directory objects on the
filesystem, and so it only notices when _those_ directories change.
If you change a symlink so that the canonical configuration file
location is now in some other directory, `notify` won't notice. But
on other platforms, notify just does "stat()" in a loop. On those,
it _will_ notice if the configuration file changes.)
A number of severe problems with the circuit reactor were fixed which
could cause reordering of cells (which causes relays to terminate the
circuit with a protocol violation, as they become unable to decrypt
them). These mostly revolve around improper usage of queues:
- The code assumed that a failure to place cells onto the channel would
persist for the duration of a reactor cycle run. However, under high
contention, this wouldn't always be the case.
- This leads to some cells getting enqueued while others go straight
through, before the enqueued cells.
- To fix this, we block sending cells out of the channel while there
are still some enqueued.
- The hop-specific queues queued after encryption, not before. This was
very brittle, and led to frequent mis-ordering.
- This was fixed by making them not do that.
This is arti!264 / 5bce9db562 without the
refactor part.
Since the user can put their logfiles and configuration files in the
same directory, writing to the log can trigger an event from
`notify`. If we log every non-interesting event from `notify`, then
we'll trigger the logs every time we log, and fill up the disk.
This commit removes the offending log and adds a comment about why.
If we someday decide we do need to log here, maybe we can rate-limit
the messages or something.
Due to limitations in notify and the OS APIs it uses, it isn't
actually so useful to watch a single file. Instead, we have to
watch the directories that contain the files, and filter out any
events that aren't about the specific files we care about.
I've put the logic here into a new type, but I've left the type
un-exported: its API is pretty ugly, inasmuch as the caller needs to
jump through hoops to only get the events that they want. That's
not too bad so long as the API is private, but we'd want better if
we were exposing this.
Prompted by clippy::needless_question_mark. Sometimes Ok(r?) is
needed to do automatic error conversion. I assume the lint checks for
that. Anyway, in these cases it's not needed.
clippy::needless_borrow quibbles here, IMO correctly. Its suggestion
didn't go far enough: output is a String and a &String can be passed
to write as-is for identical effect.
I'm slightly concerned about whether this is behavior people would
expect to have on-by-default, so let's make this off-by-default for
now.
Maybe the `application` and `system` sections should merge?
This is by no means our final API, but should represent an
improvement. Here instead of having to specify a list of files and
their is-this-optional status, along with a list of command-line
options, we have a single structure that encapsulates all of that
information.
Two advantages here:
- Callers no longer have to remember what the boolean means.
- We can "reload" more easily, by keeping the source object around.
This change also implements the correct behavior for our default
configuration file in `arti::main`: if the file is absent and the
user doesn't list a config file, that's no problem. But if the user
lists _that very same config file, we should insist that it be
present.
This implements the proposal from arti#298, making the
`BenchmarkResults` type be made out of a bunch of new `Statistic` types
(which summarize the mean, median, range, and standard deviation of an
arbitrary value) instead of overloading `TimingSummary` for this
purpose.
There are a couple of places where we forgot to truncate our
return-buffer to its actual size, and instead returned a big bunch
of zeros. Found while writing the tests in the next commit.
Someday, we'll have ReadBuf and won't have to worry about these
things.
tor-netdir needs to bump because tor-netdoc bumped, even though
there were no other changes in tor-netdir. Whoops.
tor-guardmgr needs to bump because it already published, with the
older tor-netdir.
This failure occurred because our tests use canned data to exercise
the directory state functionality, and the canned consensus has
suddenly become very expired.
There are better fixes possible, but this is a minimal one that
should get CI working on main again.
This makes our layout more similar to our other crates, and
successfully informs our grcov exclusion pattern that these tests
are indeed tests.
Doing this knocks down the reported coverage for the tor-rtcompat
crate, but that's okay: we hadn't earned it.
I hereby promise that this commit is only code-movement.
This took some refactoring, so that I wouldn't need to define 9
different versions of the function. It also required that we change
the behavior of test_with_all_runtimes slightly, so that it asserts
on _any_ failure rather than asserting on most but returning Err()
for others. That in turn required changes to a few of its callers.
There's probably a better way to do all of this macro business, but
this is the best I could find.
This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
Having separate types here doesn't justify the (very limited)
benefit of distinguishing between the case where we have created an
executor that we own and the case where we have a handle to an
already-running tokio executor.
Part of #301.
This is based on @janimo's approach in !74, but diverges in a few
important ways.
1. It assumes that something like !251 will merge, so that we can
have separate implementations for native_tls and rustls compiled
at the same time.
2. It assumes that we can implement this for the futures::io traits
only with no real penalty.
3. It uses the `x509-signature` crate to work around the pickiness of
the `webpki` crate. If webpki eventually solves their
[bug 219](https://github.com/briansmith/webpki/issues/219), we
can remove a lot of that workaround.
Closes#86.
The docs even say this is about stream.
As @nickm writes in
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/252#note_2771289
we generally call end-to-end connections that are tunneled over Tor
"Streams" to distinguish them from everything else in the Tor
protocols that could possibly be called a "Connection".
That seems to apply here too.
This is totally not just an exercise to get combined test coverage
for tor-circmgr over 90% because I needed something to do that
wouldn't distract anybody else. :)
Since it implements a "<=" type relationship, it should be called
"at_least_as_permissive_as()." Since it's a crate-private function,
the long name isn't too bad.
In the usual case, set_isolation_group is awkward.
This is perhaps slightly duplicative with TorClient::isolated_client().
If so then perhaps the *latter* should be abolished.
We're soon going to have our different Runtime types be built as
CompoundRuntime instances. We don't want to expose that detail,
though, so we'll use this macro to make them implement the right
traits.
This type can solve two problems at once.
First, it lets users replace parts of an existing runtime
implementation without replacing the whole thing. For example, you
can use it to override your TcpProvider implementation to solve
problems like #235.
Second, we can use it internally to tor-rtcompat to define Runtimes
piece-by-piece. Mostly we'll use this to separate our Tls
implementations from our implementations of the rest of the Runtime.
This commit combines status update information from tor-dirmgr and
tor-chanmgr in the arti-client crate, so that the user can get to
it; it represents a high-level view of the client's ability to reach
the network and route traffic.
I have omitted the tor-circmgr support for now; it's mostly not
needed.
At present it's not so useful, since there's no way for a client to
get a TorClient that _isn't_ completely bootstrapped, and therefore
there's no way to actually watch these events until they're no
longer interesting. That should change with arti#293.
This is part of #96.
The information is pretty basic here: we use "have we been able to
connect/TLS-handshake/Tor-handshake" as a proxy for "are we on the
internet? Are we on a reasonably unfiltered part of the internet?"
Eventually we'll want to make the information gathered and exported
more detailed: I've noted a few places in the code. For now,
however, this is about as good as C Tor does today, and it should be
a good starting point.
This uses a slightly different design from tor-dirmgr. Instead of
exporting an entire state structure via `postage::watch`, it exports
only the parts of that structure which the user is supposed to
read. I think that's more reasonable in this case because most of
the possible internal transitions in the tor-chanmgr state don't
cause a change in the exposed status.
The interface is similar to the one exposed by `arti-client`: it
internally uses postage::watch to give a series of events showing
when a bootstrap status is changing.
Thanks to the existing state/driver separation in the DirMgr design
we don't need much new logic: each download state needs to expose
(internally) how far along it is in its download, which the
bootstrap code passes to the DirMgr if it has changed.
I believe that in the long run, we'll probably want to expose more
(or different) information here, and we'll want to process it
differently. With that in mind, I've made the API for
`DirBootstrapStatus` deliberately narrow, so that we can change its
of its internal later on without breaking code that depends on it.
(The information exposed by this commit is not yet summarized in
`arti-client`.)
Part of #96.
We now conduct benchmark tests with multiple concurrent streams (by
default; this is configurable by passing `-p` to `arti-bench`).
Currently, these results just get "flattened" for the purposes of
statistical analysis (as in, results_raw contains the results of each
connection's timing summary, across all benchmark runs). This might be
something we wish to change in future.
The stats summary now also records "best" and "worst" values for each
metric, to give a rough idea of the range of values encountered.
Additionally, we now support writing the benchmark results out to a JSON
file. A future commit may integrate this with CI, so that we have
benchmark results for every commit as a build artefact.
(some documentation was also fixed)
part of arti#292
We now do multiple samples (configurable; default 3) per type of
`arti-bench` benchmark run, and take a mean and median average of all
data collected, in order to hopefully be a bit more resilient to random
outliers / variation.
This uses some `futures::stream::Stream` hacks, which might result in
more connections being made than required (and might impact the TTFB
metrics somewhat, at least for downloading).
Results now get collected into a `BenchmarkResults` struct per type of
benchmark, which will be in turn placed into a `BenchmarkSummary` in a
later commit; this will also add the ability to serialize the latter
struct out to disk, for future reference.
part of arti#292
The purpose of a this API is to tell the user how far along Arti is
in getting bootstrapped, and if it's stuck, what it's stuck on.
This API doesn't yet expose any useful information: by the time it's
observable to a client, it's always "100% bootstrapped." But I'm
putting it in a MR now so that we can review the basic idea, and to
avoid conflicts with later work on tickets like #293 and #278.
This is part of #96.
This is a fine example of why booleans are risky:
it's far to easy to pass "animate:bool" into "inanimate:bool" like
we did here.
This is a followup from our fix to #294.
Previously we were requiring authenticated sendme cells exactly when we
should be permitting the old format, and vice versa.
This bug was caused by using a boolean to represent one property, but
with giving that boolean two different senses without inverting at the
right time.
The next commit will prevent a recurrence.
Closes#294
Previously we stored only one guard sample, in a state file called
"default_guards". That's not future-proof, since we want to have
multiple samples in the future. (`guard-spec.txt` specifies
separate samples for highly restrictive filters, and for bridge
usage.)
This patch changes our behavior so that we can store multiple
samples in a new "guards" file.
I had thought about automatically migrating from the previous file
format and location, but I don't think that's necessary given our
current (lack of) stability guarantees.
Closes#176.