Because we want to work more on ensuring that our semver stability
story is solid, we are _not_ bumping arti-client to 1.0.0 right now.
Here are the bumps we _are_ doing. Crates with "minor" bumps have
had API breaks; crates with "patch" bumps have had new APIs added.
Note that `tor-congestion` is not bumped here: it's a new crate, and
hasn't been published before.
```
tor-basic-utils minor
fs-mistrust minor
tor-config minor
tor-rtcompat minor
tor-rtmock minor
tor-llcrypto patch
tor-bytes patch
tor-linkspec minor
tor-cell minor
tor-proto minor
tor-netdoc patch
tor-netdir minor
tor-persist patch
tor-chanmgr minor
tor-guardmgr minor
tor-circmgr minor
tor-dirmgr minor
arti-client minor
arti-hyper minor
arti major
arti-bench minor
arti-testing minor
```
This patch shortens the duration of the `does_not_predict_old_ports`
test in the preemptive module. AppVeyor spawns its VMs/containers per
build, so the `Instant::now()` call returns a value smaller than `60 *
60 + 1` which causes the subtraction to overflow and thus panic.
Thanks to @trinity-1686a for the help here.
See: tpo/core/arti#563.
This gets rid of many Result(). Many parameters are renamed.
Test cases of the now-impossible branch are removed.
Deleting the match from padding_parameters will come in a moment.
I've split off that commit since it has much whitespace noise.
for now, change the error type to Void.
A "transient" error is one that does not indicate a true failure,
but rather an _expected_ need to retry. When we hit one of these,
we do not count it against the total number of permitted failures.
(We do impose a higher limit on "real failures plus transient
failures", though, to prevent infinite loops in the event of a
programming error.
Closes#517.
Channel padding depends on what the channel is being used for. We
therefore need to let the channel code know this information.
The implementation of the per-channel padding control logic will be in
the new note_usage function, which for now is simply a stub.
A future commit will introduce a `PaddingControlState` which lives in
the channel frontend; consult the doc comment for that type to see why
the plumbing through the channel manager terminates in the channel
frontend.
The chanmgr remembers the last dormancy state it was told.
We invent a chanmgr-specific Dormancy which the arti-client code knows
how to convert from the richer top-level dormant status. This avoids
having to have everyone know all the variants of the top-level state.
To call reconfigure_general, we must also obtain and plumb through a
netdir. Right now we must return an internal error if there is in
fact no netdir, because reconfigure_general does not yet cope with a
missing netdir.
Nothing actually *uses* the dormancy yet.
With this change, each individual identity type becomes optional.
The functions that expose them unconditionally are now in a "legacy"
trait that only some downstream types are expected to implement.
There are new convenience APIs in HasRelayIds:
* to return Option<&keytype>,
* to see if one identity-set contains another.
This commit will break several downstream crates! For the
reviewer's convenience, I will put the fixes for those crates into a
series of squash! commits on this one.
tor-netdir
----------
Revise tor-netdir to accept optional identities. This required some
caveats and workarounds about the cases where we have to deal with a
key type that the tor-netdir code does not currently recognize at
all. If we start to add more identity types in the future, we may
well want more internal indices in this code.
tor-proto
---------
In order to make tor-proto support optional identities, there were
fewer changes than I thought. Some "check" functions needed to start
looking at "all the ids we want" rather than at "the two known IDs";
they also needed to accommodate that case where we don't have an ID
that we demand.
This change will also help with bridges, since we want to be able to
connect to a bridge without knowing all of its IDs up front.
The protocol currently _requires_ the two current ID types in some
places. To deal with that, I added a new `MissingId` error.
I also removed a couple of unconditional identity accessors for
chanmgr; code should use `target().identity(...)` instead.
tor-chanmgr
-----------
This is an incomplete conversion: it does not at all handle channel
targets without Ed25519 identities yet. It still uses those
identities to index its internal map from identity to channel; but
it gives a new `MissingId` error type if it's given a channel target
that doesn't have one.
We'll want to revise the map type again down the road when we
implement bridges, but I'd rather not step on the channel-padding
work in progress right now.
tor-guardmgr
------------
This change is mostly a matter of constructing owned identity types
more sensibly, rather than unwrapping them directly.
There are some places marked with TODOs where we still depend on
particular identity types, because of how the directory protocol
works. This will need revisiting when we add bridge support here.
tor-circmgr
-----------
These changes are just relatively simple API changes in the tests.
We want the set of identities supported by a relay to be extensible
in the future with minimal fuss; we'd also like to make working
with these ID sets more convenient. To handle that, this commit
adds a new trait for "Something that has the same IDs as a relay"
and a new object for "an owned representation of a relay's IDs."
This commit introduces a similar trait for "Something with a list of
SocketAddr, like a relay has." There's no owned equivelent for
that, since Vec<SocketAddr> is already a thing.
Closes#428.
Do _not_ bump the dependency versions on crates that have had no
changes since arti 0.0.5, since those crates do not depend on the
new APIs.
```
cargo set-version -p tor-basic-utils --bump patch
cargo set-version -p tor-llcrypto --bump patch
git restore crates/tor-checkable
git restore crates/tor-consdiff
git restore crates/tor-rtmock
```
This performs the transitive closure of the last operation:
everything that depends on a crate with a breaking change gets the
version which it depends on bumped.
```
cargo set-version -p tor-proto --bump minor
cargo set-version -p tor-netdoc --bump minor
cargo set-version -p arti-hyper --bump minor
cargo set-version -p arti-bench --bump minor
cargo set-version -p arti-testing --bump minor
cargo set-version -p tor-config --bump minor
```
Found these by disabling the nightly dbg macro special case. Now, we
have a mechanism for globally adding suppressions to tests, we can use
that instead.
According to doc/Errors.md, and in keeping with current best
practices, we should not include display an error's `source()` as
part of that error's display method. Instead, we should let the
caller decide to call source() and display that error in turn.
Part of #323.
The "full" feature is a catch-all for all features, _except_:
* Those that select a particular implementation (like
tor-llcrypto/with-openssl) or build flag (like "static")
* Those that are experimental or unstable (like "experimental-api")
* Those that are testing-only.
Previously it was the job of a task in CircMgr to do this; but we're
going to want to give GuardMgr full access to the latest NetDir for
this, and for other code-simplification reasons.
With this change I'm deprecating a couple of functions in
tor-circmgr. It's no longer necessary for us to have an artificial
external way for you to feed new NetDirs to a circmgr. (I could
just remove them, but I want practice deprecating.)
This only affects uses of thread_rng(), and affects them all more or
less indiscriminately. One test does not work with
ARTI_TEST_PRNG=deterministic; the next commit will fix it.
This commit was made by reverting the previous commit, then
re-running the script I used to generate it. In theory there should
be no semantic changes: only changes due to improved formatting from
cargo edit.
I followed the following procedure to make these changes:
* I used maint/changed_crates to find out which crates had changed
since 0.3.0.
* I used grep and maint/list_crates to sort those crates in
topological (dependency) order.
* I looked through semver_status to find which crates were listed as
having semver-relevant changes (new APIs and breaking changes).
* I scanned through the git logs of the crates with no
semver-relevant changes listed to confirm that, indeed, they had
no changes. For those crates, I incremented their patch-level
version _without_ changing the version that other crates depend on.
* I scanned through the git logs of the crates with no
semver-relevant changes listed to confirm that, indeed, they had
no obvious breaking changes.
* I treated all crates that depend on `arti` and/or `arti-client` as
having breaking changes.
* I identified crates that depend on crates that have changed, even
if they have not changed themselves, and identified them as having
a non-breaking change.
* For all of the crates, I used `cargo set-version -p $CRATE --bump
$STATUS` (where `STATUS` is `patch` or `minor`) to update the
versions, and the depended-upon versions.
I have Plans for this macro. In particular:
* I have a wip branch which tests that the Builder can be
deserialised from an empty config (ie, that config reading
of a config with a blank section for this item works).
* I think we should autogenerate $Config::builder(),
and promote that, rather than $ConfigBuilder::default().
This macro could do that.
Discovered by a test case in my local tree. The test case was
macro-generated by an extension of impl_standard_builder (which
macro istself currently awaiting review, arti!499)
Have also sent an MR to update the upstream docs
https://github.com/jean-airoldie/humantime-serde/pull/8
For reference, the git source for this crate (and the others in its
workspace) currently lives in my personal github account (ijackson).
If this fork turns out to be long-lived and gains features and/or
users, it would be good to move it to a gitlab somewhere.
I have granted Nick crate ownership on the crates.io system.
* Builders additionally derive: Debug, Serialize, Deserialize.
* Validated structs no longer derive: Serialize, Deserialize
and all related attributes deleted.
* As a consequence, all the `#[serde(deny_unknown_fields)]`
are gone. That means that right now unknown fields are totally
ignored. This is good for compatibility but poor for useability.
Doing something better here is arti#417, in progress.
* As a consequence, delete tor_dirmgr::retry::default_parallelism.
(The default value was already duplicated into a builder attr.)
And drop the ad-hoc orport() method. This brings FallbackDir's
orports field in line with our list builder API.
The general semver note in "configuation" seems to cover most of this.
Although these do not appear in the config, it does have a builder.
It seems sensible to get rid of this ad-hoc list manipulation site,
and replace it with our standard list builder API.
define_list_builder_helper requires that the builder element type be
Deserialize. Currently GuardUsageRestriction is a transparent, public
enum, so we aren't really exposing anything.
We could introduce GuardUsageRestrictionBuilder now, but
since it's not in the config and thereofore only in the public API of
the lower crates, we can definitely put that off.
The new API is (roughly) as discussed in
https://gitlab.torproject.org/tpo/core/arti/-/issues/451
This is quite a large commit and it is not convenient to split it up.
It contains the following changes:
* Redo the list builder and accessor macros implemnetation,
including docs and tests.
* Change uses of define_list_config_builder. In each case:
- Move the docs about the default value to the containing field.
- Remove the other docs (which were just recapitulations, and
are now not needed since the ListBuilder is no longer public).
- Rewmove or replace `pub` in the define_list_builder_helper call,
so that the builder is no longer public.
- Change the main macro call site to use define_list_builder_helper.
- Add a call to define_list_builder_accessors.
* Make the module `list_builder` pub so that we have somewhere to
put the overview documentation.
* Consequential changes:
- Change `outer.inner().replace(X)` to `outer.set_inner(X)`
- Consequential changes to imports (`use` statements).
The `[patch]` approach causes the tree not to build when used as a
dependency, unless the `[patch]` is replicated into the depending
project.
Instead, replace our `derive_builer =` dependencies with a reference
to a specific git commit:
perl -i~ -pe 'next unless m/^derive_builder/; s#"(0\.11\.2)"#{ version = "$1", git = "https://github.com/ijackson/rust-derive-builder", rev = "ba0c1a5311bd9f93ddf5f5b8ec2a5f6f03b22fbe" }#' crates/*/Cargo.toml
Note that the commitid has changed. This is because derive_builder is
in fact a workspace of 4 crates. 3 of them are of interest to arti
itself (the 4th exists only for testing). So the same "add git
revision" treatment had to be done to the `derive_builder` and
`derive_builder_macro` crates. Each dependency edge involves a new
commit in the derive_builder workspace, since we can't create a git
commit containing its own commitid. (We want to use commits, rather
than a branch, so that what we are depending on is actually properly
defined, and not subject to the whims of my personal github
namespace.)
There are no actual code changes in derive_builder.
And add an imprecation in define_list_config_builder's doc comment do
do so in future for other invocations of the macro.
Add add the missing full stops.
This is an automated change made with a perl one-liner and verified
with grep -L and grep -l.
Some warnings are introduced with this change; they will be removed
in subsequent commits.
See arti#208 for older discussion on this issue.
This means that `NetworkConfig::initial_predicted_ports` is now like
the other list-like things, returning `&mut list_builder` with the same
`set()` and `append()` methods.
This commitid is the current head of my MR branch
https://github.com/colin-kiegel/rust-derive-builder/pull/253https://github.com/ijackson/rust-derive-builder/tree/field-builder
Using the commitid prevents surprises if that branch is updated.
We will require this newer version of derive_builder. The version
will need to be bumped again later, assuming the upstream MR is merged
and upstream do a release containing the needed changes.
We will need the new version of not only `derive_builder_core` (the
main macro implementation) but also`derive_builder` for a new error
type.
Rather than running preemptive circuit construction every 10
seconds, we change it to back off when it is "failing". (We define
"failing" as creating no new circuits, and as giving at least one
error.)
This change means that we'll have one less reason to hammer the
network when our connectivity is failed for some reason.
Closes#437.
Part of #329.
This feature is similar to ChanProvenance from ChanMgr, except that
we don't yet need to report it outside the crate. I'm going to use
it to distinguish newly created circuits from existing circuits in
the preemptive circuit builder.
This lets us say that the UsageMismatch cases in some parts of the
code reflect a programming error (RetryTime::Never), whereas in
other case it reflects another circuit request getting to the
circuit first (RetryTime::Immediate).
Previously we did not distinguish errors that came from pending
circuits from errors that came from the circuits we were
building. We also reported errors as coming from "Left" or "Right",
instead of a more reasonable description.
We were treating restrict_mut() failures as internal errors, and
using internal errors to represent them. But in fact, these
failures are entirely possible based on timing. Here's how it
happens:
* Two different circuit requests arrive at the same time, and both
notice a pending circuit that they could use.
* The pending circuit completes; both pending requests are notified.
* The first request calls restrict_mut(), and restricts the request
in such a way that the second couldn't use it.
* The second request calls restrict_mut(), and gets a failure.
Because of this issue, we treat these errors as transient failures
and just wait for another circuit.
Closes#427.
(This is not a breaking API change, since `AbstractSpec` is a
crate-private trait.)
Not all of these strictly need to be bumped to 0.2.0; many could go
to 0.1.1 instead. But since everything at the tor-rtcompat and
higher layers has had breaking API changes, it seems not so useful
to distinguish. (It seems unlikely that anybody at this stage is
depending on e.g. tor-protover but not arti-client.)
The older default seems (experimentally) to be ridiculously high.
Generally, if we can't build a circuit within a handful attempts,
that circuit has already timed out... unless there is a fast-failure
condition, in which case we're just hammering the network (or our
view of it.)
Found with `arti-testing` for #329.
Previously, if we had launch_parallelism > 1, and we were willing to
retry building a circuit max_retries times, then we'd launch up to
max_retries * launch_parallelism circuits before giving up. Ouch!
With this patch, we try to keep the total number of circuits
planned and attempted to the actual max_retries limit.
Part of #329; found with arti-testing.
The FirstHopId type now records an enum that stores whether the hop
is a guard or a fallback. This change addresses concerns about
remembering to check the type or source of an Id before passing it
down to the FallbackState or GuardSet.
Making this change required an API change, so that dirmgr can
report success/failure status without actually knowing whether it's
using a fallback or a guard.
The code here uses a new iterator type, since I couldn't find one of
these on crates.io. I tried writing the code without it, but it was
harder to follow and test.
We do this by creating a new FallbackSet type that includes status
information, and updating the GuardMgr APIs to record success and
failure about it when appropriate. We can use this to mark
FallbackDirs retriable (or not).
With this change, FallbackDir is now stored internally as a Guard in
the GuardMgr crate. That's fine: the FallbackDir type really only
matters for configuration.
If we're building a path with the guard manager involved, we now ask
the guard manager to pick our first hop no matter what. We only
pick from the fallback list ourselves if we're using the API with no
guard manager.
This causes some follow-on changes where we have to remember an
OwnedChanTarget object in a TorPath we've built, and where we gain
the ability to say we're building a path "from nothing extra at
all." Those are all internal to the crate, though.
Closes#220, by making sure that we use our guards to get a fresh
netdir (if we can) before falling back to any fallbacks, even if our
consensus is old.
Compilation should be fixed in the next commit.
The guard manager is responsible for handing out the first hops of
tor circuits, keeping track of their successes and failures, and
remembering their states. Given that, it makes sense to store this
information here. It is not yet used; I'll be fixing that in
upcoming commits.
Arguably, this information no longer belongs in the directory
manager: I've added a todo about moving it.
This commit will break compilation on its own in a couple of places;
subsequent commits will fix it up.
This is the logical place for it, I think: the GuardMgr's job is to
pick the first hop for a circuit depending on remembered status for
possible first hops. Making this change will let us streamline the
code that interacts with these objects.
The various background daemon tasks that `arti-client` used to spawn are
now handled inside their respective crates instead, with functions
provided to spawn them that return `TaskHandle`s.
This required introducing a new trait, `NetDirProvider`, which steals
some functionality from the `DirProvider` trait to enable `tor-circmgr`
to depend on it (`tor-circmgr` is a dependency of `tor-dirmgr`, so it
can't depend on `DirProvider` directly).
While we're at it, we also make some of the tasks wait for events from
the `NetDirProvider` instead of sleeping, slightly increasing
efficiency.
Some error types indicate that the guard has failed as a dircache.
We should treat these errors as signs to close the circuit, and to
mark the guard as having failed.
We already have the ability to get peer information from ChanMgr
errors, and therefore from any RetryErrors that contain ChanMgr
errors.
This commit adds optional peer information to tor-proto errors, and
a function to expose whatever peer information is available.
It'll soon more convenient to pass in FallbackDirs as a slice of
references, rather than just a slice of FallbackDirs: I'm going to
be changing how we handle these in tor-dirmgr.
If all guards are down and they won't be retriable for a while, try
waiting that long to get whichever guard _is_ retriable.
Additionally, if we are making multiple circuit plans in parallel,
only report our planning as having failed if we failed at making
_all_ the plans. Previously we treated any failure as fatal for the
other plans, which could lead to trouble in the case when guards
were all down or pending.
Part of #407.
Instead of requiring a `Box<dyn Isolation>`, it now takes either a
`Box<dyn Isolation>`, or an arbitrary `T` that implements
`Isolation`.
This API still allows the user to pass in a `Box<dyn Isolation>` if
that's what they have, but it doesn't require them to Box the
isolation on their own.
Part of #414.
This has the different syntax for builder field attributes than what I
originally proposed in my MR, and which therefore is in the pinned
branch.
My upstream MR for the field attributes feature was morged:
https://github.com/colin-kiegel/rust-derive-builder/issues/239
This has the humantime_serde::option module, which we have upstreamed
and are about to switch to.
The remaining dependency with version = "1" is going to be removed
in a moment.
I used
git-grep -P '\#\[serde\((?!default|deny_unknown)'
to find places where I needed to add additional attributes on the
builder method fields.
This is currently a bit duplicative, but when #371 is completely done,
the validated (non-builder) configs won't need to be Deserialize any
more.
This is part of #371 and #372.
We are going to want to specify custom attributes on fields of the
builder struct. This feature was missing from derive_builder.
This commitid is the current head of my MR branch
https://github.com/colin-kiegel/rust-derive-builder/pull/237https://github.com/ijackson/rust-derive-builder/tree/builder-field-attrs
Using the commitid prevents surprises if that branch is updated.
We will require this newer version of derive_builder. The version
will need to be bumped again later, assuming the upstream MR is merged
and upstream do a release containing the needed changes.
When I wrote this, I arranged to skip dumping the field `pending`.
This must have been because I thought that either
(a) PendingEntry couldn't `#[derive(Debug)]` (but it can)
and/or
(b) Some of the fields of PendingEntry ought not to be dumped because
they might contain (eg) packet data. But I think they don't: there's
just the spec, and the Result which is (basically) a Circ.
I tried preseving something closer to the original using educe, but
educe gets somehow tangled up with the generics, and the result fails
to compile. I haven't investigated this further.
Fixes#365
Inspection of the code and logs shows that:
* One of the plan futures' oneshots must be returning Cancelled
* This means that the corresponding sender must have been dropped
* The sender is owned by the task spawned by spawn_launch
Presumably that entire task gets dropped as part of executor shutdown,
or something.
The correct response in this situation is to declare that we are
shutting down, and stop trying to do stuff.
Unfortunately, despite trying quite hard by putting sleeps in various
strategic places, I have not been able to reproduce the problem. So I
can't be 100% sure that the new behaviour is correct.
But I am reasonably confident that this ought not to be able to occur
unless either 1. the task from spawn_launch is dropped, or 2. that
task somehow panics despite its attempts to trap panics and report
them as errors through the oneshot.
So this "burn it all down" action ought only to occur in actually
serious situations.
I observe that
3ff9b187ea
Handle panics from circuit construction.
changed the EK for PendingCanceled to EK::ReactorShuttingDown,
and there's From impl. I think, therefore, that it is right
to reuse this Error variant.
I don't quite understand why when take_action gets an actual error it
doesn't push it, but just logs it. But I am not changing that for
now.
Arguably the two instances of retry_error.push are a sign of an
inferior flow control pattern - maybe the loop body including the code
I am adding ought to be an IEFE returning
`Result<Option<circ>, crate::Error>`.
I wanted this while debugging something.
The ad-hoc impl Debug with f.debug_struct is getting repetitive
and I've already perpetrated one copy-paste mistake.
We should consider using something like the `educe` crate's Clone.
This lint is IMO inherently ill-conceived.
I have looked for the reasons why this might be thought to be a good
idea and there were basically two (and they are sort of contradictory):
I. "Calling ‘.clone()` on an Rc, Arc, or Weak can obscure the fact
that only the pointer is being cloned, not the underlying data."
This is the wording from
https://rust-lang.github.io/rust-clippy/v0.0.212/#clone_on_ref_ptr
It is a bit terse; we are left to infer why it is a bad idea to
obscure this fact. It seems to me that if it is bad to obscure some
fact, that must be because the fact is a hazard. But why would it be
a hazard to not copy the underlying data ?
In other languages, faliing to copy the underlying data is a serious
correctness hazard. There is a whose class of bugs where things were
not copied, and then mutated and/or reused in multiple places in ways
that were not what the programmer intended. In my experience, this is
a very common bug when writing Python and Javascript. I'm told it's
common in golang too.
But in Rust this bug is much much harder to write. The data inside an
Arc is immutable. To have this bug you'd have use interior mutability
- ie mess around with Mutex or RefCell. That provides a good barrier
to these kind of accidents.
II. "The reason for writing Rc::clone and Arc::clone [is] to make it
clear that only the pointer is being cloned, as opposed to the
underlying data. The former is always fast, while the latter can
be very expensive depending on what is being cloned."
This is the reasoning found here
https://github.com/rust-lang/rust-clippy/issues/2048
This is saying that *not* using Arc::clone is hazardous.
Specifically, that a deep clone is a performance hazard.
But for this argument, the lint is precisely backwards. It's linting
the "good" case and asking for it to be written in a more explicit
way; while the supposedly bad case can be written conveniently.
Also, many objects (in our codebase, and in all the libraries we use)
that are Clone are in fact simply handles. They contain Arc(s) (or
similar) and are cheap to clone. Indeed, that is the usual case.
It does not make sense to distinguish in the syntax we use to clone
such a handle, whether the handle is a transparent Arc, or an opaque
struct containing one or more other handles.
Forcing Arc::clone to be written as such makes for code churn when a
type is changed from Arc<Something> to Something: Clone, or vice
versa.
To implement this, we had to refactor the tor_circmgr api for
flushing state changes to disk, so that it checks if it has the lock,
and only then tries to store.
We handle them by reporting them to task that's waiting for the
circuit, then relaying the panic.
Doing so allows the waiting task to distinguish panics
(EK::Internal) from cases where the reactor dropped the task
entirely (EK::ReactorShuttingDown). And doing _that_ removes one
case of EK::Canceled, which helps us on our goals towards #348.
Closes#347.
From its old name, this error had implied that we were giving no
useful information when we were waiting on a pending cirucit request
that failed. In fact, this error would only happen if we dropped the
`mpsc::Sender` for a circuit attempt without reporting success or
failure.
These errors should almost never be seen by the user; we should instead
retry the circuit. But they _can_ be seen by the use if selecting a
guard takes too long, or too many attempts. (Therefore, they aren't true
"internal" errors.)
I suspect that we might not want to keep this TransientFailure kind, but
I'm not sure what else to do here for now.
In one of the two places, nightly no longer warns. In the other
place, it's fine for nightly to warn: I just fixed the code to take
a slice instead.
Partial revert of 856aca8791.
Resolves part of #310.