This includes:
* a CachedDir::join method.
* functions to read and write from provided filenames in a
CachedDir.
* a method to tell whether a fs-mistrust error is about bad file
permissions, or failure to inspect file permissions or some other
kind of IO problem.
For reference, the git source for this crate (and the others in its
workspace) currently lives in my personal github account (ijackson).
If this fork turns out to be long-lived and gains features and/or
users, it would be good to move it to a gitlab somewhere.
I have granted Nick crate ownership on the crates.io system.
Here we add a config option to disable safe logging, and ensure that
safe logging is disabled when we are formatting an error message on
exit (since we assume it's safe to write sensitive info to stderr.)
This is a rough first-cut of an API that I think might help us with
keeping limited categories of sensitive information out of our logs.
I'll refine it based on experiences with using it.
* Builders additionally derive: Debug, Serialize, Deserialize.
* Validated structs no longer derive: Serialize, Deserialize
and all related attributes deleted.
* As a consequence, all the `#[serde(deny_unknown_fields)]`
are gone. That means that right now unknown fields are totally
ignored. This is good for compatibility but poor for useability.
Doing something better here is arti#417, in progress.
* As a consequence, delete tor_dirmgr::retry::default_parallelism.
(The default value was already duplicated into a builder attr.)
And drop the ad-hoc orport() method. This brings FallbackDir's
orports field in line with our list builder API.
The general semver note in "configuation" seems to cover most of this.
This allows us to use this with an item builder type which doesn't
impl Default. (Obviously this only makes sense for items which aren't
actually builders.)
It is Quite Vexing that we have to use [ ] rather than the < > around
the generics, particularly given that we are also using [ ] to signal
"this is arrayish".
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Although these do not appear in the config, it does have a builder.
It seems sensible to get rid of this ad-hoc list manipulation site,
and replace it with our standard list builder API.
define_list_builder_helper requires that the builder element type be
Deserialize. Currently GuardUsageRestriction is a transparent, public
enum, so we aren't really exposing anything.
We could introduce GuardUsageRestrictionBuilder now, but
since it's not in the config and thereofore only in the public API of
the lower crates, we can definitely put that off.
This type was returned by the public DownloadSchedule::builder
function. But the only thing that seems to have noticed that the type
name itself wasn't exported, was rustdoc. Hmmm.
The new API is (roughly) as discussed in
https://gitlab.torproject.org/tpo/core/arti/-/issues/451
This is quite a large commit and it is not convenient to split it up.
It contains the following changes:
* Redo the list builder and accessor macros implemnetation,
including docs and tests.
* Change uses of define_list_config_builder. In each case:
- Move the docs about the default value to the containing field.
- Remove the other docs (which were just recapitulations, and
are now not needed since the ListBuilder is no longer public).
- Rewmove or replace `pub` in the define_list_builder_helper call,
so that the builder is no longer public.
- Change the main macro call site to use define_list_builder_helper.
- Add a call to define_list_builder_accessors.
* Make the module `list_builder` pub so that we have somewhere to
put the overview documentation.
* Consequential changes:
- Change `outer.inner().replace(X)` to `outer.set_inner(X)`
- Consequential changes to imports (`use` statements).
Previously this field was differently named to its serde and to its
accessors. We are about to introduce a macro_rules macro which will
provide list accessors and we don't want that macro to have a field
renaming feature.
So stop renaming the field.
Document that this can contain either a string for expansion, or a
literal PathBuf not for expansion.
Rename the `from_path` method to `new_literal`: a very important
difference is whether it gets expanded - less important than the Rust
type. Also, now it takes `Into<PathBuf>`, which avoids a needless
clone.
(We don't change the API in `arti-client` because
`&tempfile::Tempdir()` doesn't implement `Into<PathBuf>`, so
`arti-client` has to have some new `as_ref` calls.)
Provide accessors `as_unexpanded_str` and `as_literal_path`. The
deserialisation already makes this part of the stable API,l so not
pvoding accessors seems just obstructive. They are useful for tests,
too.
Add tests for the new entrypoints, and for deserialisation of both
variants from TOML (via config, or directly) and JSON.
We introduce LiteralPath struct, so that a literal path deserialises
from
some_path = { literal: "actual path string" }
This makes the deserialisation unambiguous.
arti uses this. Somehow this seems to be enabled by some other thing
in the crate graph, but I found that adding a similar dependency to
another crate resulted in a `config` which doesn't compile.
This helps make it possible to use `SecureDir` (name pending) even
when we want to disable permissions checks. Otherwise, optional
permission checking would require users of this crate to maintain
separate code paths for the "check" and "don't check" cases.
This required a bit of poking through the `users` crate, to mess
with the user and group dbs. The original goal was to "trust the
group with the same name as us", but it turned into a bit of a
production, since:
* We want to take our own name from $USER, assuming that matches
our uid. (Otherwise we want to ask getpwuid_r().)
* We only want to trust the group if we are actually a member of
that group.
* We want to cache this information.
* We want to test this code.
Previously we would temporarily put self.resolved into an invalid
state by adding a path component that might be a symlink. With this
change, we create a new temporary path object (using Cow to avoid
unnecessary allocations) and only conditionally replace
self.resolved.
The only way to get a SecureDir is by having checked a directory.
Once you have one, it encourages you to open and create files and
directories with the right permissions, and checks them for you.
This crate is meant to solve #315 by giving a way to make sure that
a file or directory is only accessible by trusted users. I've tried
to explain carefully (in comments and documentation) what this crate
is doing and why, under the assumption that it will someday be read
by another person like me who does _not_ live and breathe unix file
permissions. The crate is still missing some key features, noted in
the TODO section.
It differs from the first version of the crate by taking a more
principled approach to directory checking: it emulates the path
lookup process (reading symlinks and all) one path change at a time,
thus ensuring that we check every directory which could enable
an untrusted user to get to our target file, _or_ which could
enable them to get to any symlink that would get them to the target
file.
The API is also slightly different: It separates the `Mistrust`
object (where you configure what you do or do not trust) from the
`Verifier` (where you set up a check that you want to perform on a
single object). Verifiers are set up to be a bit ephemeral,
so that it is hard to accidentally declare that _every_ object
is meant to be readable when you only mean that _some_ objects
may be readable.
The `[patch]` approach causes the tree not to build when used as a
dependency, unless the `[patch]` is replicated into the depending
project.
Instead, replace our `derive_builer =` dependencies with a reference
to a specific git commit:
perl -i~ -pe 'next unless m/^derive_builder/; s#"(0\.11\.2)"#{ version = "$1", git = "https://github.com/ijackson/rust-derive-builder", rev = "ba0c1a5311bd9f93ddf5f5b8ec2a5f6f03b22fbe" }#' crates/*/Cargo.toml
Note that the commitid has changed. This is because derive_builder is
in fact a workspace of 4 crates. 3 of them are of interest to arti
itself (the 4th exists only for testing). So the same "add git
revision" treatment had to be done to the `derive_builder` and
`derive_builder_macro` crates. Each dependency edge involves a new
commit in the derive_builder workspace, since we can't create a git
commit containing its own commitid. (We want to use commits, rather
than a branch, so that what we are depending on is actually properly
defined, and not subject to the whims of my personal github
namespace.)
There are no actual code changes in derive_builder.
The arti crate itself is only used in benchmark and testing crate.
I think this sentence does not belong here.
Also extend retry-error description (from Architecture.md).
Now that we require Rust 1.56, we can upgrade to AES 0.8. This
forces us to have some slight API changes.
We require cipher 0.4.1, not cipher 0.4.0, since 0.4.0 has
compatibility issues with Rust 1.56.
This is actually a number of *attempts* not a number of *retries*.
The setter method was already called "attempts".
This chnages the deserialisation of the config.
Use sub_builder. We must do something special for defaults.
This involves moving the actual default values for retry_bootstrap and
retry_microdescs into config.rs, since they need to access the fields
of the un-built version of the structure. (An alternative would be to
generate "weak setters" which do not override previous settings, but
derive_builder does not offer to generate them and that seems
overkill.)
Instead, everyone should use DownloadScheduleBuilder.
The new() method would in any case be useless in a moment, since we're
going to embed DownloadScheduleBuilder in the NetworkConfig, not
DownloadSchedule.
The call sites in the tests are all about to change again.
The current behaviour is to treat 0 as indicating "use the default",
which is quite strange. We are going to get rid of that.
The new way will be to reject zero, during
DownloadScheduleBuilder::build, Add a test case for that.
Really, we probably don't want any of these not to be pub, but it
triggers "unreachable pub" in my test cases, and making it not pub by
mistake seems not very serious, and likely to be noticed.
Making the struct private in the test cases has the useful effect of
checking that all the methods are tested.
And add an imprecation in define_list_config_builder's doc comment do
do so in future for other invocations of the macro.
Add add the missing full stops.
This is an automated change made with a perl one-liner and verified
with grep -L and grep -l.
Some warnings are introduced with this change; they will be removed
in subsequent commits.
See arti#208 for older discussion on this issue.
This means that `NetworkConfig::initial_predicted_ports` is now like
the other list-like things, returning `&mut list_builder` with the same
`set()` and `append()` methods.
NetworkConfigBuilder needs to not contain any validated structs, so
that its serde does not expose the validated details.
AuthorityListBuilder is what ought to go here - and it contains
Vec<AuthorityBuilder>, not Vec<Authority>. As a consequence, many
places now deal with AuthorityBuilder, rather than Authority.
Since "logfiles: Introduce LogfileListConfigBuilder", this code is in
LogfileListConfigBuilder::build(), which is called by derive_builder's
generated LoggingConfig::build(), and which will add a file context
itself due to the `sub_builder` feature.
So this is otiose. And, we are about to replace this whole thing with
macro_rules-generated code (which won't do this).
If duration addition overflows, then continue with Never.
Caching the AbsRetryTime constructed with duration from supplied
function also reduces the overhead of earliest_absolute.
In
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/462#note_2797697
we decided not to do this.
However, having looked again at the way the FallbackList works, I
think there is a lot of value in making these two things (and anything
else like them[1]) as similar as possible.
[1] At least PreemptiveCircuitConfig.initial_predicted_ports and
NetworkConfig.authorities need the same treatment, and perhaps also
GuardUsage.restrictions (although there is no
GuardRestrictionBuilder).
In the irc discussion I imagined `LogfilesConfigBuilder` as opposed to
`LogfileConfigBuilder` (differing only in the `s`) which would be bad,
but we can use `List` instead.
We do *not* need to abstract away the validated version of the config.
Providing a type alias helps the derive_builder sub_builder DTRT
without needing special overrides.
I have split this commit so that we can drop it, if we conclude it's
not wanted.
Now the network fallbacks configuration wants to Deserialize
a Vec<FallbackDirBuilder>, rather than validated Vec<FallbackDir>.
Methods on FallbackListBuilder are as per
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/462#note_2797697
mutatis mutandi for the fact that this struct has only fallbacks in it.
This is where the FallbackList type is. We are going to want to
provide a builder too, which ought to impl Default.
This means that the default value for the type must be next to the
type. In any case, it was anomalous that it wasn't.
This commit is pure code motion.
This commitid is the current head of my MR branch
https://github.com/colin-kiegel/rust-derive-builder/pull/253https://github.com/ijackson/rust-derive-builder/tree/field-builder
Using the commitid prevents surprises if that branch is updated.
We will require this newer version of derive_builder. The version
will need to be bumped again later, assuming the upstream MR is merged
and upstream do a release containing the needed changes.
We will need the new version of not only `derive_builder_core` (the
main macro implementation) but also`derive_builder` for a new error
type.
Rename "deletable" to "obsolete".
Simplify function structure.
Report errors from `metadata()` and `modified()`.
Don't claim that we're going to delete something unless we are.
Comment about making CUTOFF configurable.
Rather than running preemptive circuit construction every 10
seconds, we change it to back off when it is "failing". (We define
"failing" as creating no new circuits, and as giving at least one
error.)
This change means that we'll have one less reason to hammer the
network when our connectivity is failed for some reason.
Closes#437.
Part of #329.
This feature is similar to ChanProvenance from ChanMgr, except that
we don't yet need to report it outside the crate. I'm going to use
it to distinguish newly created circuits from existing circuits in
the preemptive circuit builder.
Instead of just having a function that recalculates the latest clock
skew, instead recalculate the clock skew when it may have changed,
and notify other processes via a postage::watch.
Apparently on OSX you are not allowed to construct an Instant that is a
long time before the time when the test is running.
Also, fix the length of a year in this test.
This time, our estimator discards outliers, takes the mean of what's
left, and uses the standard deviation to try to figure out how
seriously to take our report of skew/not-skew.
These estimates are still not actually used.
Fortunately, we don't need a separate type here: authenticated
clock skew can only come attached to a `tor_proto::Error`.
We also remove skew from `tor_proto::Error::HandshakeCertsExpired`,
since it would now be redundant.
of a channel.
At first I wanted to have this information not be a part of channels
at all, but it is a fairly tiny amount of data, and the alternatives
are pretty crufty.
Upstream 0.8.2 has broken compilation with Rust 1.53; versions
0.8.{0,1} have been yanked.
Possibly by the time the next arti version comes out, they'll have
fixed this situation, or we'll have upgraded our MSRV.
Upstream issue at https://github.com/Nugine/rlimit/issues/42 .
Doing this will make us treat caches that send us these objects as
not-working, and close circuits to them instead of trying over and
over.
The case where we add a document from the cache requires special
handling: it isn't actually a error to find an expired document in
our cache (unless the passage of time itself is erroneous, which is
a debatable proposition at best).
Fixes#431.
The old version of this function was error-prone, and in fact had
errors: it was too easy to forget to add non-persistent fields, and
that's exactly what we forgot in a few cases
(`microdescriptor_missing`, `circ_history`, and
`suspicious_behavior_warned`).
The new version of this function consumes both of the incoming
Guards, and constructs every field explicitly so that we can't
forget to list any.
Closes#429.
Previously, we treated successfully building a circuit to a guard as
a "success", and any failure, including a directory cache failure,
as a failure. With this change, guards now have separate
success/failure and retry status for circuit usage and directory
usage.
This change is needed for guard-as-directory retry to have
reasonable behavior. Otherwise, when a guard succeeds at building a
circuit, that clears the directory-is-failing status and makes us
retry the guards to quickly.
Previously the code would do stuff like
```
schedule = RetrySchedule::new(INITIAL_DELAY);
```
which is needlessly verbose, since the schedule already keeps track
of its initial delay.
This lets us say that the UsageMismatch cases in some parts of the
code reflect a programming error (RetryTime::Never), whereas in
other case it reflects another circuit request getting to the
circuit first (RetryTime::Immediate).
For each case, describe its semantics (in addition to when you would
create it).
Explain the relationship between After and At.
Stop saying "Strategy": we renamed this type to "RetryTime".
Previously we did not distinguish errors that came from pending
circuits from errors that came from the circuits we were
building. We also reported errors as coming from "Left" or "Right",
instead of a more reasonable description.
We were treating restrict_mut() failures as internal errors, and
using internal errors to represent them. But in fact, these
failures are entirely possible based on timing. Here's how it
happens:
* Two different circuit requests arrive at the same time, and both
notice a pending circuit that they could use.
* The pending circuit completes; both pending requests are notified.
* The first request calls restrict_mut(), and restricts the request
in such a way that the second couldn't use it.
* The second request calls restrict_mut(), and gets a failure.
Because of this issue, we treat these errors as transient failures
and just wait for another circuit.
Closes#427.
(This is not a breaking API change, since `AbstractSpec` is a
crate-private trait.)
Not all of these strictly need to be bumped to 0.2.0; many could go
to 0.1.1 instead. But since everything at the tor-rtcompat and
higher layers has had breaking API changes, it seems not so useful
to distinguish. (It seems unlikely that anybody at this stage is
depending on e.g. tor-protover but not arti-client.)
Unlike the rest of the crates, these don't have a "tor-" or "arti-"
prefix, and are potentially used by code outside arti. With that in
mind, it's probably for the best not to bump them to 0.2.0 along
with the rest of our crates.
They have had no changes since 0.1.0 other than refactoring and
changing of clippy lints. Therefore, I'm not bumping the
dependencies from other crates onto these: it's fine whether our
other crates use caret/retry-error 0.1.0 or 0.1.1.
This feature allows us to detect different failing cases for
arti#329 that would otherwise be hard to induce. It works by
filtering consensus directory objects and/or microdescriptor objects
before introducing them to the directory manager.
Closes#397.
This commit uses the `visibility` and `visible` crates to
conditionally make certain structs and their fields public
(respectively). This is incredibly dangerous to use for anything
besides testing, and I've tried to write the documentation for the
feature accordingly.
The older default seems (experimentally) to be ridiculously high.
Generally, if we can't build a circuit within a handful attempts,
that circuit has already timed out... unless there is a fast-failure
condition, in which case we're just hammering the network (or our
view of it.)
Found with `arti-testing` for #329.
Previously, if we had launch_parallelism > 1, and we were willing to
retry building a circuit max_retries times, then we'd launch up to
max_retries * launch_parallelism circuits before giving up. Ouch!
With this patch, we try to keep the total number of circuits
planned and attempted to the actual max_retries limit.
Part of #329; found with arti-testing.
The previous algorithm had two flaws:
* It would wait even after the final attempt, when there were no
more retries to do.
* It would fail to wait between attempts if an error occurred.
This refactoring fixes both of these issues, and adds some comments.
The FirstHopId type now records an enum that stores whether the hop
is a guard or a fallback. This change addresses concerns about
remembering to check the type or source of an Id before passing it
down to the FallbackState or GuardSet.
Making this change required an API change, so that dirmgr can
report success/failure status without actually knowing whether it's
using a fallback or a guard.
The code here uses a new iterator type, since I couldn't find one of
these on crates.io. I tried writing the code without it, but it was
harder to follow and test.
We do this by creating a new FallbackSet type that includes status
information, and updating the GuardMgr APIs to record success and
failure about it when appropriate. We can use this to mark
FallbackDirs retriable (or not).
With this change, FallbackDir is now stored internally as a Guard in
the GuardMgr crate. That's fine: the FallbackDir type really only
matters for configuration.
If we're building a path with the guard manager involved, we now ask
the guard manager to pick our first hop no matter what. We only
pick from the fallback list ourselves if we're using the API with no
guard manager.
This causes some follow-on changes where we have to remember an
OwnedChanTarget object in a TorPath we've built, and where we gain
the ability to say we're building a path "from nothing extra at
all." Those are all internal to the crate, though.
Closes#220, by making sure that we use our guards to get a fresh
netdir (if we can) before falling back to any fallbacks, even if our
consensus is old.
Compilation should be fixed in the next commit.
We only do this when we fail to get a regular guard (e.g., because
they're all down), and when we have been asked for a guard for a
one-hop directory.
Most of the change in this commit is plumbing to make all of the
types match up.
As before, compilation may still be broken.
We need to extend our notion of "the origin of a guard" to include
"somewhere outside the guard list"; we need the ability to return a
FallbackDir as a Guard; and we need to remember a few more pieces of
information in each pending request.
As before, this commit may break compilation; it will be restored soon.
The guard manager is responsible for handing out the first hops of
tor circuits, keeping track of their successes and failures, and
remembering their states. Given that, it makes sense to store this
information here. It is not yet used; I'll be fixing that in
upcoming commits.
Arguably, this information no longer belongs in the directory
manager: I've added a todo about moving it.
This commit will break compilation on its own in a couple of places;
subsequent commits will fix it up.
This is the logical place for it, I think: the GuardMgr's job is to
pick the first hop for a circuit depending on remembered status for
possible first hops. Making this change will let us streamline the
code that interacts with these objects.
The various background daemon tasks that `arti-client` used to spawn are
now handled inside their respective crates instead, with functions
provided to spawn them that return `TaskHandle`s.
This required introducing a new trait, `NetDirProvider`, which steals
some functionality from the `DirProvider` trait to enable `tor-circmgr`
to depend on it (`tor-circmgr` is a dependency of `tor-dirmgr`, so it
can't depend on `DirProvider` directly).
While we're at it, we also make some of the tasks wait for events from
the `NetDirProvider` instead of sleeping, slightly increasing
efficiency.