This patch makes sure that for every* config type we have, the defaults
you get from a Builder match those you get from Serde, and that both
match the value that you get from arti_defaults.toml. Later down the
line I'll be adding some tests to keep these in sync.
* StorageConfig still has no defaults of its own, since we aren't so
sure we want other applications to use Arti's directories by default.
Since these shell-variables are hardwired to use org.torproject.Arti as
the program name, it isn't appropriate to call them "app-specific".
If we someday reinstate APP_FOO, it should be based on a user-provided
application name.
In order to handle explicitly specified path buffers directly, we now
let CfgPath be either a string (that gets expanded) or a PathBuf
(that doesn't).
This simplifies TorClientConfig::with_directories()
Now every section that the two configuration objects share has the
same type and name. This should help us in documenting our configuration
in a way that doesn't confuse people.
There is still lots of API work to go.
To do this at all neatly, I had to split out `tor-config` from
`arti-config` again, and putting the lower level stuff (paths,
builder errors) into tor-config. I also changed our use of
derive_builder to always use a common error type, to avoid
error type proliferation.
The `bad_extend_*` failures were caused by bad test code in
`bad_extend_test_impl` that used `futures::join!`; this meant that the
reactor could receive the `Extended2` cell before it actually got the
`ExtendNtor` request, which caused it to get (quite rightly) confused
and close the circuit. Spawning a background thread which has a short
delay before sending the `Extended2` cell seems to have alleviated this
problem.
`new_circ_create_failure` is similar; I think the reactor was getting
dropped before it had a chance to flush out its `CreateFast` cell
properly, because it had already gotten the result back (since the test
code sends it indiscriminately). This was "fixed" in much the same
manner as the other test: making it wait a bit before sending the result
cell back.
There seem to be other tests that use `futures::join!` (like
`begindir`?), and use similarly erroneous patterns; I haven't gotten any
to fail reliably enough to be able to debug them, though.
This field isn't used in modern Tor, and has never been used in
Arti. If tor!489 is merged, then it will no longer contain a useful
value in future consensuses. We shouldn't store it, or else
somebody else will get the smart idea of using it for something.
This commit breaks API compatibility for tor-netdoc with the
`build_docs` feature enabled. I haven't entered that into the
semver_status.md file, since we already have a pending tor-netdoc
API breaker in !129.
Nothing in arti currently uses this document type. Eventually it
will be useful for relays and for bridge clients.
I've left the "SHA1 digest of a router descriptor" type available
unconditinoally, however, since it does get used in a few places.
Part of #125.
We don't currently need a couple of the key manipulation features
that we have, since we aren't yet doing relays or onion service
clients.
Part of #125
Previously, the reactor would use an `UnboundedSender` to send things to
the `RawCellStream`, in order that the reactor wouldn't block if you
failed to read from the latter. This is bad, though, since it means
people can just run us out of memory by sending lots of things.
To fix this, we make the new `StreamReader` type (which does the reading
parts from `RawCellStream`) keep track of the stream's receive window
and issue SENDMEs once *it* has consumed enough data to require it, thus
meaning that we shouldn't get sent enough data to fill the channel
between reactor and `StreamReader` (and, if we do, that's someone trying
to flood us, and we abort the circuit).
As hinted to above, the `RawCellStream` was removed and its reading
functionalities replaced by `StreamReader`; its writing functionalities
are handled by `StreamTarget` anyway, so we just give out one of those
for the write end. This now means we don't need any mutexes!
note: this commit introduces a known issue, arti#230
Rather like e8e9699c3c ("Get rid of
tor-proto's ChannelImpl, and use the reactor more instead"), this
admittedly rather large commit refactors the way circuits in `tor-proto`
work, centralising all of the logic in one large nonblocking reactor
which other things send messages into and out of, instead of having a
bunch of `-Impl` types that are protected by mutexes.
Congestion control becomes a lot simpler with this refactor, since the
reactor can manage both stream- and circuit-level congestion control
unilaterally without having to share this information with consumers,
meaning we can get rid of some locks.
The way streams work also changes, in order to facilitate better
handling of backpressure / fairness between streams: each stream now has
a set of channels to send and receive messages over, instead of sending
relay cells directly onto the channel (now, the reactor pulls messages
off each stream in each map, and tries to avoid doing so if it won't be
able to forward them yet).
Additionally, a lot of "close this circuit / stream" messages aren't
required any more, since that state is simply indicated by one end of a
channel going away. This should make cleanup a lot less brittle.
Getting all of this to work involved writing a fair deal of intricate
nonblocking code in Reactor::run_once that tries very hard to be mindful
of making backpressure work correctly (and congestion control); the old
code could get away with having tasks .await on things, but the new
reactor can't really do this (as it'd lock the reactor up), so has to do
everything in a nonblocking manner.
It requires tracing-subscriber 0.2, which is a lower version than we
want, and which causes trouble with our minimal-versions CI test.
There is a pending issue to fix this; we can reinstate tracing-test
once it is merged: https://github.com/dbrgn/tracing-test/pull/11
This patch makes the rust-nightly CI task fail if it detects any
dbg!(), println!(), or eprintln!() calls in production code.
Because of clippy limitations, it may also gripe about calls to
these macros in our tests. The preferred workarounds are to either
instead. Both are acceptable.
We're doing this check in CI rather than unconditionally with clippy
directives, since we often want to have these calls in our code
temporarily while we're developing. Some day we might want this
test to go into a pre-push hook.
This patch also adds #![allow()] directives for println!() and
eprintln!() in the arti crate. Since that one isn't a library, it's
okay for it to speak to stdout/stderr.
Closes#218.
Previously it was either all-locked or all-not-locked. Now you can
simulate having the same shared storage opened by multiple managers,
only one of which has the lock.
@nickm pointed out that refactoring tor_proto::channel's Reactor to do
sending as well meant that it could only send or receive, but not both,
simultaneously, which was bad!
To fix this, rewrite Reactor::run_once to use a handcrafted future (with
futures::future::poll_fn) that can handle the logic required to push
items onto the sink asynchronously (i.e. checking that it can be written
to before trying to do that, and then flushing it).
This also means we don't use select_biased! any more, and just handroll
that logic ourselves; as a small bonus, we can now process all 3 kinds
of message in one run_once() call, instead of having to do only one of
them.
Instead of awkwardly sharing the internals of a `tor-proto` `Channel`
between the reactor task and any other tasks, move most of the internals
into the reactor and have other tasks communicate with the reactor via
message-passing to allocate circuits and send cells.
This makes a lot of things simple, and has convenient properties like
not needing to wrap the `Channel` in an `Arc` (though some places in the
code still do this for now).
A lot of test code required tweaking in order to deal with the refactor;
in fact, fixing the tests probably took longer than writing the mainline
code (!). Importantly, we now use `tokio`'s `tokio::test` annotation
instead of `async_test`, so that we can run things in the background
(which is required to have reactors running for the circuit tests).
This is an instance of #205, and also kind of #217.
We define "coming back online" as happening when a guard attempt
succeeds, if that attempt that was launched when we seemed to be
offline.
We define "seeming to be offline" as having all of our primary
guards marked unreachable, and having received no incoming network
traffic in a while.
Closes#216.
We need this for the circuit timeout estimator (#57). It needs to
know "how recently have we got some incoming traffic", so that it
can tell whether a circuit has truly timed out, or whether the
entire network is down.
I'm implementing this with coarsetime, since we need to update these
in response to every single incoming cell, and we need the timestamp
operation to be _fast_.
(This reinstates an earlier commit, f30b2280, which I reverted
because we didn't need it at the time.)
Closes#179.
This is based on @eta's patches for !118 and !119: Since we already
have an unbounded channel, we don't need to use an elaborate mess of
one-shot senders. We can just use the unbounded_send() method,
which also lets us enqueue a message without having to await.
Closes#219.
Basically the same thing as 371437d338
("Refactor tor_proto::channel::Reactor to use an UnboundedSender"), but
for tor_proto::circuit's Reactor instead.
(part of arti#217)
There wasn't any good reason for tor-proto's channel reactor to use a
shedload of oneshot channels instead of just an mpsc UnboundedSender,
and the whole `CtrlResult` thing made even less sense.
Straighten this code out by replacing all of that machinery with a
simple UnboundedSender, instead.
(part of arti#218)
Most of the structs in `arti-client` have example code now, to give a
clearer idea of how they're used.
Annoyingly, a lot of the types exposed in `arti-client` are actually
re-exports, which makes documentation a bit harder: example code that
references other parts of `arti-client` can't actually be run as a
doctest, since the crate it's in is a dependency of `arti-client`.
We might be able to fix this in future by doing the documentation in
`arti-client` itself, but rustdoc seems to have some weird behaviours
there that need to be investigated first (for example, it seems to merge
the re-export and original documentation, and also put the re-export
documentation on the `impl` block for some reason).
For now, though, this commit just writes the docs from the point of view
of an `arti-client` consumer, removing notes specific to the crate in
which they're defined. It's not ideal, but at least the end user
experience is decent.
Instead of putting a fully qualified name in the text, in most cases
we should just use the short name of the type or function we're
referring to.
In other words, instead of saying [`crate::module::Foo`], we should
typically say [`Foo`](crate::module::Foo).
This overhauls the top-level `arti-client` documentation significantly:
- the "Using arti-client" section walks the user through all of the
necessary steps to initiate a Torified TCP connection, and then
provides a code example
- this example is also available as `examples/readme.rs`; it's not run
as a doctest, since it involves connecting to Tor
- a "More advanced usage" subheading provides information about stream
isolation (and can potentially be used for other interesting
features once we get them).
- a new "Multiple runtime support" section was added to explain the
purpose and usage of the `tor-rtcompat` crate
- the section on design and privacy considerations was removed; this is
probably okay to keep in a README, but users of the crate aren't going
to be interested in this (at least I don't think)
(also, the doc comment for `arti_client::Error` was fixed to make actual
sense)
This test uses a consensus that I've copied from
tor-netdoc/testdata. I would include it directly, but I think that
will cause trouble when it comes time to run "cargo package".
The previous version of this test used the old, racy version of
wait_for (see #149). The new version is refactored so that
simulated time is only allowed to advance after each step is done,
so that we can actually be sure that each step in the process will
happen as it should.
In order to get the time-advances to proceed properly, and avoid
polluting state between tests, I've had to introduce some machinery
to encode the proper amount for time to advance. It isn't something
I'd want to use for a whole bunch of tests, but for just one set,
it's fine.
These tests now pass reliably for me.
I wonder if a discrete-event-simulation approach (hello, Shadow)
would let us write tests like these to our hearts' content?
This switches out `arti`'s argument-parsing library with `clap`, which
is a lot more featureful (and very widely used within the Rust
ecosystem). We also now use a lot of `clap`'s features to improve the
CLI experience:
- The CLI now expects a subcommand (currently, either "help", or "proxy"
for the existing SOCKS proxy behaviour). This should let us add
additional non-SOCKS-proxy features to arti in future.
- `clap` supports default values determined at runtime, so the way the
default config file is loaded was changed: now, we determine the
OS-specific path for said file before invoking `clap`, so the help
command can show it properly.
- The behaviour of `tor_config` was also changed; now, one simply
specifies a list of configuration files to load, together with
whether they're required.
- That function also way overused generics; this has been fixed.
- Instead of using the ARTI_LOG environment variable to configure
logging, one now uses the `-l, --log-level` CLI option.
(The intent is for this option to be more discoverable by users.)
- The `proxy` subcommand allows the user to override the SOCKS port used
on the CLI without editing the config file.
The Futureproof<T> type lets you serialize and deserialize types whose
representations might change (most useful for enums that might grow
additional variants). It uses #[serde(untagged)] to accomplish this.
This gets used in order to make the `disabled` field of `Guard` more
robust against future guard disablement reasons being added.
A test was also added to verify correct behaviour of the new type.
As per arti#175, we'd like to be able to handle newer Arti versions
storing additional state in the persisted state files, without dropping
this data on the floor when we write out changes to these files.
Use the #[serde(flatten)] mechanism to achieve this, by adding catch-all
HashMap<String, JsonValue> fields to all structs that are at risk of
this happening to them.
There seems to be some issue here with the new WaitFor code,
where using the same MockSleepProvider with both of these wait_for()
calls gives questionable behavior under some circumstances (like
when running under Tarpaulin with the wrong set of flags).
We must not apply our new path-bias behavior (where we blame a guard
if it gives us too many indeterminate circuit failures) if the path
was not chosen at random. If too many random paths fail, we know
that's suspicious, since the other relays are a random sample. But
if a bunch of user-provided paths fail, that could simply be because
the user's chosen exit is down.
We now track, for every guard: the total number of successful
circuits we've built through it, along with the total number of
"indeterminate" circuits.
Recall that a circuit's status is "indeterminate" if it has failed
for a reason that _might_ be the guard's fault, or might not be the
guard's fault. For example, if extending to the second hop of the
circuit fails, we have no way to know whether the guard deliberately
refused to connect there, or whether the second hop is just offline.
But we don't want to forgive all indeterminate circuit failures: if
we did, then a malicious guard could simply reject any second hops
that it didn't like, thereby filtering the client into a chosen
set of circuits.
As a stopgap solution, this patch now makes guards become
permanently disabled if the fraction of their circuit failures
becomes too high.
See also general-purpose path bias selection (arti#65), and Mike's
idea for changing the guard reachability definition (torspec#67).
This patch doesn't do either of those.
Closes#185.
Instead of racily advancing time forward, this commit attempts to rework
how WaitFor works, such that it makes advances when all sleeper futures
that have been created have been polled (by handing the MockSleepRuntime
a Waker with which to wake up the WaitFor).
The above described mechanics work well enough for the double timeout
test, but fail in the presence of code that spawns asynchronous /
background tasks that must make progress before time is advanced for the
test to work properly. In order to deal with these cases, a set of APIs
are introduced in order to block time from being advanced until some
code has run, and a carveout added in order to permit small advances in
time where required.
(In some cases, code needed to be hacked up a bit in order to be made
properly testable using these APIs; the `MockablePlan` trait included in
here is somewhat unfortunate.)
This should fix arti#149.
Now that we have two kinds of isolation tokens (those set on a
stream, and those set by the stream's associated TorClient), we
need a more sophisticated kind of isolation.
This fixes the bug introduced with the previous commit, where
per-stream tokens would override per-TorClient tokens.
The new `hyper` tor-client example demonstrates integrating arti with the
popular Rust `hyper` HTTP library by implementing a custom Hyper "connector"
(a type that can initiate connections to HTTP servers) that proxies said
connections via the Tor network.
This will let callers use the tokio traits on these types too, if
they call `split()` on the DataStream.
(Tokio also has a `tokio::io::split()` method, but it requires a
lock whereas `DataStream::split()` doesn't.)
futures::io::AsyncRead (and Write) isn't the same thing as tokio::io::AsyncRead,
which is a somewhat annoying misfeature of the Rust async ecosystem (!).
To mitigate this somewhat for people trying to use the `DataStream` struct with
tokio, implement the tokio versions of the above traits using `tokio-util`'s
compat layer, if a crate feature (`tokio`) is enabled.
The three arguments TorClient::bootstrap requires by way of configuration
have been factored into a new TorClientConfig object.
This object gains two associated functions: one which uses `tor_config`'s
`CfgPath` machinery to generate sane defaults for the state and cache
directories, and one that accepts said directories in order to create a
config object with those inserted.
(this commit was inspired by trying to use arti as a library and being somewhat
overwhelmed by the amount of config stuff there was to do :p)
Previously we'd say that we were "waiting for the other process to
bootstrap" even if it was already bootstrapped: and we wouldn't
actually declare success when it was done.
The `get_relay` function was confusing, since it would return None if
the relay was present, but wasn't actually a guard. We only used it
in one place, and in that one place we used it wrong, leading to a
panic bug.
Fixes#193.
For now, this avoids having to separately handle
AuthorityBuilderError, DirMgrConfigBuilderError, DownloadScheduleConfigBuilderError,
NetworkConfigBuilderError and FallbackDirBuilderError when anyhow is not
used.
Turn off a clippy warning.
The default soft limit is typically enough for process usage on most
Unixes, but OSX has a pretty low default (256), which you can run
into easily under heavy usage.
With this patch, we're going to aim for as much as 16384, if we're
allowed.
Fixes part of #188.
I don't love this approach, but those errors aren't distinguished by
ErrorKind, so we have to use libc or winapi, apparently. At least
nothing here is unsafe.
Addresses part of #188.
The previous code would report all failures to build a circuit as
failures of the guard. But of course that's not right: If we
fail to extend to the second or third hop, that might or might not
be the guard's fault.
Now we use the "pending status" feature of the GuardMonitor type so
that an early failure is attributed to the guard, but a later
failure is attributed as "Indeterminate". Only a complete circuit
is called a success. We use a new "GuardStatusHandle" type here so
that we can report the status early if there is a timeout.
(When we're building a path with a guard, we need to tell the guard
manager whether the path succeeded, and we need to wait to hear
whether the guard is usable.)
The advantage here is that we no longer have to use a futures-aware
Mutex, or a blocking send operation, and therefore can simplify a
bunch of the GuardMgr APIs to no longer be async. That'll avoid
having to propagate the asyncness up the stack.
The disadvantage is that unbounded channels are just that: nothing
in the channel prevents us from overfilling it. Fortunately, the
process that consumes from the channel shouldn't block much, and
the channel only gets filled when we're planning a circuit path.
I tried using -Z minimal-versions to downgrade all first-level
dependencies to their oldest permitted versions, and found that we
were apparently depending on newer features of all three crates.
I'm kind of surprised there were only three.
Also, refactor our message handling to be more like the tor_proto
reactors. The previous code had a bug where, once the stream of
events was exhausted, we wouldn't actually get any more
notifications.
There are some missing parts here (like persistence and tests)
and some incorrect parts (I am 90% sure that the "exploratory
circuit" flag is bogus). Also it is not integrated with the circuit
manager code.
The limitations with toml seemed to be reaching a head, and I wasn't
able to refactor the guardmgr code enough to actually have its state
be serializable as toml. Json's limitations are much narrower.