Previously DocSource would tell you whether the document was from
a local store or a cache server, but it wouldn't tell you _which_
server it came from.
This change required adding DocSource as an argument to
DirState::add_from_download.
This was anomalous, in that it contains &'static str, rather than a
proper nested error (eg a config::ConfigError, maybe).
But in fact it tursn out it is now not constructed. The last
construction site was removed a long time ago in
Use derive_builder for Authority and FallbackDir.
With this API we can now stop consensus download attempts early if
any consensus that the directory cache gave us would be necessarily
too far in the future or in the past.
This saves wasted bandwidth for clients with skewed clocks.
Closes#466.
If we're happy with a directory from 3 days ago, we should say
"if-modified-since 3 days ago".
This patch is larger than I'd like, since I had to add &DirMgrConfig
as an argument to the functions that make a consensus request.
Closes#467.
Since we want to be willing to use older consensuses, we don't
necessarily want to reset a download just because the consensus is
expired.
This new behavior isn't ideal either; I've added a TODO that relates
to #433.
Related of #412
This new section describes how much variance we accept when it comes
to expired and not-yet-valid directory documents. (Currently, the
only ones where this matters for are consensus documents and
authority certificates.) A document that is invalid by no more than
these tolerances is not _live_, but it can still be used.
These tolerances serve two purposes:
* First, they allow clients to run with a little more clock skew
than they would tolerate otherwise.
* Second, they allow clients to survive the situation where the
authorities are unable to reach a consensus for a day or two.
Compare with Tor's REASONABLY_LIVE_TIME and NETWORKSTATUS_ALLOW_SKEW
constants; also compare with proposal 212.
Closes#412.
- Some FIXMEs got removed or amended.
- AddMicrodescs now yields a mutable reference, so we can use .drain()
and reuse the allocation.
- Some panics were downgraded to debug_asserts.
- We don't want to inadvertently replace our netdir with one that's
actually older, so detect and error on this condition.
- Also, print a debug line when we get a new netdir without enough
guards.
- (An unrelated TODO was also added.)
- Taking a previous netdir directly and keeping it around before we need
it is a bit of a waste of memory, and also doesn't mesh well with how
SharedMutArc works.
- To remedy this, introduce a new trait `PreviousNetDir` and have the
state machines take that instead. (I was a bit tempted to just pass in
the SharedMutArc directly. Maybe I should've done that.)
- GetMicrodescsState now uses the NetDirChange API to propagate netdir
changes, instead of modifying the netdir directly.
- PendingNetDir was refactored in order to support this use case.
- As a result, the netdir-related methods in WriteNetDir can be removed,
leaving only the DirFilter for now.
- add_from_cache() no longer takes a store, because nothing uses it.
- (bodge: apply_netdir_changes() was put in a few places missed
previously)
- The new DirState::get_netdir_change() API lets the state machine
export a NetDirChange: a request to either replace the current netdir,
or add microdescs to it.
- bootstrap.rs now consumes this new API, even though nothing implements
it yet.
- This will let us implement GetMicrodescsState without having to
directly mutate the netdir. The calling code also handles checking the
netdir against the circmgr for sufficiency, and updating the consensus
metadata in the store, meaning the revised GetMicrodescsState will not
have to perform these tasks.
- The additional parameters passed to GetConsensusState are now passed
through all the states, and used as well.
- WriteNetDir doesn't have a now() or config() method any more, since
the states now get this from the runtime or the config parameters.
- This required modifying the tests to make a mocked runtime and custom
config directly, instead of using DirRcv for this purpose.
- Additionally, because we don't have to upgrade a weak reference for
DirState::dl_config(), that function no longer wraps its return value
in Result.
- (A bunch of the FIXMEs from the previous commit that introduced the
additional parameters have now been rectified as a result.)
- GetConsensusState::new now takes a set of parameters matching what it
actually needs, instead of just taking a writedir. (It still *does*
take a writedir, and indeed still uses it for basically everything,
but that will eventually go away.)
- Its call sites were updated.
- Some tests now need to take a runtime, and got indented a lot as a
result.
- Resetting was made non-functional, because we need to thread through
the parameters passed to GetConsensusState to all of the other
states, too. This will happen in a later commit.
- Given that this is effectively an implementation detail, it doesn't
really make sense to have it be in the crate root...
- (also, we're going to change it a bunch now)
- fetch_single now takes what it needs, instead of an Arc<DirMgr<R>>.
- This required refactoring the CANNED_RESPONSE mechanism, given the
test would otherwise fail due to not having a CircMgr to pass to
fetch_single.
- query_into_requests is now called make_requests_for_documents, and
does the &[DocId] -> DocQuery conversion internally instead.
- DirMgr::make_consensus_request and DirMgr::query_into_requests are now
gone. The tests use the new functions, as does fetch_multiple.
- There's no good reason these functions needed to be part of the
dirmgr, apart from needing a runtime and a store.
- However, we can just add those as arguments and copy them over. This
commit does that.
- Function renamed & docs tidied up a bit
- Function signature now takes what it needs (immutable &dyn Store
instead of mutex, slice instead of Vec) and nothing more
- DocQuery::load_documents_into was also renamed
DocQuery::load_from_store_into and given similar treatment
Annoyingly, Rust doesn't automatically generate this sort of `impl` for
you, and I'd like to reduce the usage of Mutex<DynStore> everywhere else
in favour of either &dyn Store or &mut dyn Store.
(This is for two reasons: firstly, we might have a Store implementation
that doesn't use a mutex as above, or similar refactors; secondly,
passing the raw trait object reference lets us encode mutability into
the function signature, which I believe is quite valuable.)
For reference, the git source for this crate (and the others in its
workspace) currently lives in my personal github account (ijackson).
If this fork turns out to be long-lived and gains features and/or
users, it would be good to move it to a gitlab somewhere.
I have granted Nick crate ownership on the crates.io system.
* Builders additionally derive: Debug, Serialize, Deserialize.
* Validated structs no longer derive: Serialize, Deserialize
and all related attributes deleted.
* As a consequence, all the `#[serde(deny_unknown_fields)]`
are gone. That means that right now unknown fields are totally
ignored. This is good for compatibility but poor for useability.
Doing something better here is arti#417, in progress.
* As a consequence, delete tor_dirmgr::retry::default_parallelism.
(The default value was already duplicated into a builder attr.)
And drop the ad-hoc orport() method. This brings FallbackDir's
orports field in line with our list builder API.
The general semver note in "configuation" seems to cover most of this.
This type was returned by the public DownloadSchedule::builder
function. But the only thing that seems to have noticed that the type
name itself wasn't exported, was rustdoc. Hmmm.
The new API is (roughly) as discussed in
https://gitlab.torproject.org/tpo/core/arti/-/issues/451
This is quite a large commit and it is not convenient to split it up.
It contains the following changes:
* Redo the list builder and accessor macros implemnetation,
including docs and tests.
* Change uses of define_list_config_builder. In each case:
- Move the docs about the default value to the containing field.
- Remove the other docs (which were just recapitulations, and
are now not needed since the ListBuilder is no longer public).
- Rewmove or replace `pub` in the define_list_builder_helper call,
so that the builder is no longer public.
- Change the main macro call site to use define_list_builder_helper.
- Add a call to define_list_builder_accessors.
* Make the module `list_builder` pub so that we have somewhere to
put the overview documentation.
* Consequential changes:
- Change `outer.inner().replace(X)` to `outer.set_inner(X)`
- Consequential changes to imports (`use` statements).
Previously this field was differently named to its serde and to its
accessors. We are about to introduce a macro_rules macro which will
provide list accessors and we don't want that macro to have a field
renaming feature.
So stop renaming the field.
The `[patch]` approach causes the tree not to build when used as a
dependency, unless the `[patch]` is replicated into the depending
project.
Instead, replace our `derive_builer =` dependencies with a reference
to a specific git commit:
perl -i~ -pe 'next unless m/^derive_builder/; s#"(0\.11\.2)"#{ version = "$1", git = "https://github.com/ijackson/rust-derive-builder", rev = "ba0c1a5311bd9f93ddf5f5b8ec2a5f6f03b22fbe" }#' crates/*/Cargo.toml
Note that the commitid has changed. This is because derive_builder is
in fact a workspace of 4 crates. 3 of them are of interest to arti
itself (the 4th exists only for testing). So the same "add git
revision" treatment had to be done to the `derive_builder` and
`derive_builder_macro` crates. Each dependency edge involves a new
commit in the derive_builder workspace, since we can't create a git
commit containing its own commitid. (We want to use commits, rather
than a branch, so that what we are depending on is actually properly
defined, and not subject to the whims of my personal github
namespace.)
There are no actual code changes in derive_builder.
This is actually a number of *attempts* not a number of *retries*.
The setter method was already called "attempts".
This chnages the deserialisation of the config.
Use sub_builder. We must do something special for defaults.
This involves moving the actual default values for retry_bootstrap and
retry_microdescs into config.rs, since they need to access the fields
of the un-built version of the structure. (An alternative would be to
generate "weak setters" which do not override previous settings, but
derive_builder does not offer to generate them and that seems
overkill.)
Instead, everyone should use DownloadScheduleBuilder.
The new() method would in any case be useless in a moment, since we're
going to embed DownloadScheduleBuilder in the NetworkConfig, not
DownloadSchedule.
The call sites in the tests are all about to change again.
The current behaviour is to treat 0 as indicating "use the default",
which is quite strange. We are going to get rid of that.
The new way will be to reject zero, during
DownloadScheduleBuilder::build, Add a test case for that.
And add an imprecation in define_list_config_builder's doc comment do
do so in future for other invocations of the macro.
Add add the missing full stops.
This is an automated change made with a perl one-liner and verified
with grep -L and grep -l.
Some warnings are introduced with this change; they will be removed
in subsequent commits.
See arti#208 for older discussion on this issue.
NetworkConfigBuilder needs to not contain any validated structs, so
that its serde does not expose the validated details.
AuthorityListBuilder is what ought to go here - and it contains
Vec<AuthorityBuilder>, not Vec<Authority>. As a consequence, many
places now deal with AuthorityBuilder, rather than Authority.
Now the network fallbacks configuration wants to Deserialize
a Vec<FallbackDirBuilder>, rather than validated Vec<FallbackDir>.
Methods on FallbackListBuilder are as per
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/462#note_2797697
mutatis mutandi for the fact that this struct has only fallbacks in it.
This is where the FallbackList type is. We are going to want to
provide a builder too, which ought to impl Default.
This means that the default value for the type must be next to the
type. In any case, it was anomalous that it wasn't.
This commit is pure code motion.
This commitid is the current head of my MR branch
https://github.com/colin-kiegel/rust-derive-builder/pull/253https://github.com/ijackson/rust-derive-builder/tree/field-builder
Using the commitid prevents surprises if that branch is updated.
We will require this newer version of derive_builder. The version
will need to be bumped again later, assuming the upstream MR is merged
and upstream do a release containing the needed changes.
We will need the new version of not only `derive_builder_core` (the
main macro implementation) but also`derive_builder` for a new error
type.
Doing this will make us treat caches that send us these objects as
not-working, and close circuits to them instead of trying over and
over.
The case where we add a document from the cache requires special
handling: it isn't actually a error to find an expired document in
our cache (unless the passage of time itself is erroneous, which is
a debatable proposition at best).
Fixes#431.
Not all of these strictly need to be bumped to 0.2.0; many could go
to 0.1.1 instead. But since everything at the tor-rtcompat and
higher layers has had breaking API changes, it seems not so useful
to distinguish. (It seems unlikely that anybody at this stage is
depending on e.g. tor-protover but not arti-client.)
The previous algorithm had two flaws:
* It would wait even after the final attempt, when there were no
more retries to do.
* It would fail to wait between attempts if an error occurred.
This refactoring fixes both of these issues, and adds some comments.
The FirstHopId type now records an enum that stores whether the hop
is a guard or a fallback. This change addresses concerns about
remembering to check the type or source of an Id before passing it
down to the FallbackState or GuardSet.
Making this change required an API change, so that dirmgr can
report success/failure status without actually knowing whether it's
using a fallback or a guard.
We do this by creating a new FallbackSet type that includes status
information, and updating the GuardMgr APIs to record success and
failure about it when appropriate. We can use this to mark
FallbackDirs retriable (or not).
With this change, FallbackDir is now stored internally as a Guard in
the GuardMgr crate. That's fine: the FallbackDir type really only
matters for configuration.
The guard manager is responsible for handing out the first hops of
tor circuits, keeping track of their successes and failures, and
remembering their states. Given that, it makes sense to store this
information here. It is not yet used; I'll be fixing that in
upcoming commits.
Arguably, this information no longer belongs in the directory
manager: I've added a todo about moving it.
This commit will break compilation on its own in a couple of places;
subsequent commits will fix it up.
This is the logical place for it, I think: the GuardMgr's job is to
pick the first hop for a circuit depending on remembered status for
possible first hops. Making this change will let us streamline the
code that interacts with these objects.
The various background daemon tasks that `arti-client` used to spawn are
now handled inside their respective crates instead, with functions
provided to spawn them that return `TaskHandle`s.
This required introducing a new trait, `NetDirProvider`, which steals
some functionality from the `DirProvider` trait to enable `tor-circmgr`
to depend on it (`tor-circmgr` is a dependency of `tor-dirmgr`, so it
can't depend on `DirProvider` directly).
While we're at it, we also make some of the tasks wait for events from
the `NetDirProvider` instead of sleeping, slightly increasing
efficiency.
There are two reasons why the DynFilter newtype might be needed:
1. To impl Default. But we don't need it to impl Default since we can
have an accessor which does the defaulting.
2. To hide the API. But this is usrely an unstable API.
Just writing Arc<dyn> gets rid of a lot of unnecessary boilerplate and
conversion code.
Some error types indicate that the guard has failed as a dircache.
We should treat these errors as signs to close the circuit, and to
mark the guard as having failed.
It'll soon more convenient to pass in FallbackDirs as a slice of
references, rather than just a slice of FallbackDirs: I'm going to
be changing how we handle these in tor-dirmgr.
This has the different syntax for builder field attributes than what I
originally proposed in my MR, and which therefore is in the pinned
branch.
My upstream MR for the field attributes feature was morged:
https://github.com/colin-kiegel/rust-derive-builder/issues/239
We were using a hashtable to keep track of missing microdescriptor
digests. But this information is redundant with the NetDir state,
and there's now no longer any performance benefit to keeping a
separate copy.
Part of #386.
We never want a consensus document that's super-old, since we would
reject it immediately for being too old.
Also, never send an if-modified-since that's so old that we'd reject
the response.
Closes#403
This has the humantime_serde::option module, which we have upstreamed
and are about to switch to.
The remaining dependency with version = "1" is going to be removed
in a moment.
This hashtable starts out pretty large, but it can spend most of our
runtime (when we aren't downloading) being small. To avoid doing
too much work, I've made it so we only call shrink_to_fit twice per
consensus: once when we're no longer pending, and once when we're
complete.
Closes#388.
I used
git-grep -P '\#\[serde\((?!default|deny_unknown)'
to find places where I needed to add additional attributes on the
builder method fields.
This is currently a bit duplicative, but when #371 is completely done,
the validated (non-builder) configs won't need to be Deserialize any
more.
This is part of #371 and #372.
We are going to want to specify custom attributes on fields of the
builder struct. This feature was missing from derive_builder.
This commitid is the current head of my MR branch
https://github.com/colin-kiegel/rust-derive-builder/pull/237https://github.com/ijackson/rust-derive-builder/tree/builder-field-attrs
Using the commitid prevents surprises if that branch is updated.
We will require this newer version of derive_builder. The version
will need to be bumped again later, assuming the upstream MR is merged
and upstream do a release containing the needed changes.
This lint is IMO inherently ill-conceived.
I have looked for the reasons why this might be thought to be a good
idea and there were basically two (and they are sort of contradictory):
I. "Calling ‘.clone()` on an Rc, Arc, or Weak can obscure the fact
that only the pointer is being cloned, not the underlying data."
This is the wording from
https://rust-lang.github.io/rust-clippy/v0.0.212/#clone_on_ref_ptr
It is a bit terse; we are left to infer why it is a bad idea to
obscure this fact. It seems to me that if it is bad to obscure some
fact, that must be because the fact is a hazard. But why would it be
a hazard to not copy the underlying data ?
In other languages, faliing to copy the underlying data is a serious
correctness hazard. There is a whose class of bugs where things were
not copied, and then mutated and/or reused in multiple places in ways
that were not what the programmer intended. In my experience, this is
a very common bug when writing Python and Javascript. I'm told it's
common in golang too.
But in Rust this bug is much much harder to write. The data inside an
Arc is immutable. To have this bug you'd have use interior mutability
- ie mess around with Mutex or RefCell. That provides a good barrier
to these kind of accidents.
II. "The reason for writing Rc::clone and Arc::clone [is] to make it
clear that only the pointer is being cloned, as opposed to the
underlying data. The former is always fast, while the latter can
be very expensive depending on what is being cloned."
This is the reasoning found here
https://github.com/rust-lang/rust-clippy/issues/2048
This is saying that *not* using Arc::clone is hazardous.
Specifically, that a deep clone is a performance hazard.
But for this argument, the lint is precisely backwards. It's linting
the "good" case and asking for it to be written in a more explicit
way; while the supposedly bad case can be written conveniently.
Also, many objects (in our codebase, and in all the libraries we use)
that are Clone are in fact simply handles. They contain Arc(s) (or
similar) and are cheap to clone. Indeed, that is the usual case.
It does not make sense to distinguish in the syntax we use to clone
such a handle, whether the handle is a transparent Arc, or an opaque
struct containing one or more other handles.
Forcing Arc::clone to be written as such makes for code churn when a
type is changed from Arc<Something> to Something: Clone, or vice
versa.
This patch removes the EventStream associated type and the Runtime
parameter. The Runtime parameter wasn't actually used for anything,
and the EventStream was easy enough to replace with a BoxStream in
this case.
Also replaced DirBootstrapEvents with a BoxStream to avoid tying
anything to our backend.
It had too many possible Kinds depending on what kind of string had
failed to parse.
I decided to use #[source] here instead of #[from], so that we
would have to explicitly convert these errors where they show up.
The new `BootstrapBehavior` enum controls whether an unbootstrapped
`TorClient` will bootstrap itself automatically (`Ondemand`) when an
attempt is made to use it, or whether the user must perform
bootstrapping themselves (`Manual`).
The `lazy-init` example shows how you could write a simple
`get_tor_client()` function that used a global `OnceCell` to share
a Tor client across an entire application with this API.
closes arti#278
This commit changes how the `TorClient` type works, enabling it to be
constructed synchronously without initiating the bootstrapping process.
Daemon tasks are still started on construction (although some of them
won't do anything if the client isn't bootstrapped).
The old bootstrap() methods are now reimplemented in terms of the new
create_unbootstrapped() and bootstrap_existing() methods.
This required refactoring how the `DirMgr` works to enable the same sort
of thing there.
closes#293
tor-netdir needs to bump because tor-netdoc bumped, even though
there were no other changes in tor-netdir. Whoops.
tor-guardmgr needs to bump because it already published, with the
older tor-netdir.
This failure occurred because our tests use canned data to exercise
the directory state functionality, and the canned consensus has
suddenly become very expired.
There are better fixes possible, but this is a minimal one that
should get CI working on main again.
This commit puts the native-tls crate behind a feature. The feature
is off-by-default in the tor-rtcompat crate, but can be enabled
either from arti or arti-client.
There is an included script that I used to test that tor-rtcompat
could build and run its tests with all subsets of its features.
Closes#300
The interface is similar to the one exposed by `arti-client`: it
internally uses postage::watch to give a series of events showing
when a bootstrap status is changing.
Thanks to the existing state/driver separation in the DirMgr design
we don't need much new logic: each download state needs to expose
(internally) how far along it is in its download, which the
bootstrap code passes to the DirMgr if it has changed.
I believe that in the long run, we'll probably want to expose more
(or different) information here, and we'll want to process it
differently. With that in mind, I've made the API for
`DirBootstrapStatus` deliberately narrow, so that we can change its
of its internal later on without breaking code that depends on it.
(The information exposed by this commit is not yet summarized in
`arti-client`.)
Part of #96.
I found these versions empirically, by using the following process:
First, I used `cargo tree --depth 1 --kind all` to get a list of
every immediate dependency we had.
Then, I used `cargo upgrade --workspace package@version` to change
each dependency to the earliest version with which (in theory) the
current version is semver-compatible. IOW, if the current version
was 3.2.3, I picked "3". If the current version was 0.12.8, I
picked "0.12".
Then, I used `cargo +nightly upgrade -Z minimal-versions` to
downgrade Cargo.lock to the minimal listed version for each
dependency. (I had to override a few packages; see .gitlab-ci.yml
for details).
Finally, I repeatedly increased the version of each of our
dependencies until our code compiled and the tests passed. Here's
what I found that we need:
anyhow >= 1.0.5: Earlier versions break our hyper example.
async-broadcast >= 0.3.2: Earlier versions fail our tests.
async-compression 0.3.5: Earlier versions handled futures and tokio
differently.
async-trait >= 0.1.2: Earlier versions are too buggy to compile our
code.
clap 2.33.0: For Arg::default_value_os().
coarsetime >= 0.1.20: exposed as_ticks() function.
curve25519-dalek >= 3.2: For is_identity().
generic-array 0.14.3: Earlier versions don't implement
From<&[T; 32]>
httparse >= 1.2: Earlier versions didn't implement Error.
itertools at 0.10.1: For at_most_once.
rusqlite >= 0.26.3: for backward compatibility with older rustc.
serde 1.0.103: Older versions break our code.
serde_json >= 1.0.50: Since we need its Value type to implement Eq.
shellexpand >= 2.1: To avoid a broken dirs crate version.
tokio >= 1.4: For Handle::block_on().
tracing >= 0.1.18: Previously, tracing_core and tracing had separate
LevelFilter types.
typenum >= 1.12: Compatibility with rust-crypto crates
x25519-dalek >= 1.2.0: For was_contributory().
Closes#275.
When our current consensus is getting close to being invalid (but
it isn't invalid yet), we try to get a new one. So far, so good.
But we had a bug: when we went to get a new consensus, we'd see that we
had a perfectly fine not-yet-invalid consensus in our cache, reload it,
find that it was ready, and continue!
This patch fixes our behavior: If we have a usable consensus, then
when we reset the bootstrapping process, we ignore any cached consensus.
Fixes bug #274.
This prevents a security-failure condition that could happen if our
directory caches don't give us these microdescriptors, but we
nevertheless decide that the directory is usable.
Closes#178
This warning occurs if we ask for microdescriptors from our local
cache, and our cache gives us something we didn't ask for. It
shouldn't be possible, so let's warn when it occurs.
This patch resolves an XXXX.
This approach tries to preserve the current interface, but uses a
counter-based event backend to implement a coalescing stream of
events that can be represented as small integers. The advantage
here is that publishing events no longer needs to be a blocking
operation, since there is no queue to fill up.
We can't change the authorities while in-flight: that would be pretty
miserable to implement.
Similarly we can't change the cache while in-flight.
Everything else should be fair game, though there are a couple of tricky
bits. I've tried to document those.
This patch doesn't actually make anything reconfigurable, but it
does create an API that will tell you "you can't change the value of
that!" If the API looks reasonable, I can start making it possible
to change the values of individual items.
We want to only use TODO in the codebase for non-blockers, and open
tickets for anything that is a bigger blocker than a TODO. These
XXXXs seem like definite non-blockers to me.
Part of arti#231.
To do this at all neatly, I had to split out `tor-config` from
`arti-config` again, and putting the lower level stuff (paths,
builder errors) into tor-config. I also changed our use of
derive_builder to always use a common error type, to avoid
error type proliferation.
It requires tracing-subscriber 0.2, which is a lower version than we
want, and which causes trouble with our minimal-versions CI test.
There is a pending issue to fix this; we can reinstate tracing-test
once it is merged: https://github.com/dbrgn/tracing-test/pull/11
This test uses a consensus that I've copied from
tor-netdoc/testdata. I would include it directly, but I think that
will cause trouble when it comes time to run "cargo package".
Previously we'd say that we were "waiting for the other process to
bootstrap" even if it was already bootstrapped: and we wouldn't
actually declare success when it was done.