Merge branch 'ipts' into 'main'

dev notes: Draft IPT algorithm: note re intro pt verification (followup)

See merge request tpo/core/arti!1438
This commit is contained in:
Ian Jackson 2023-07-26 15:14:18 +00:00
commit 9966627479
1 changed files with 47 additions and 6 deletions

View File

@ -33,6 +33,11 @@ Stream of introduction requests,
done by passing an mpsc sender into the IPT Manager's constructor,
which is simply cloned and given to each IPT Establisher.
(Each IPT Establisher is told by the IPT Manager
when a descriptor mentioning that IPT is about to be published,
so that the IPT Establisher can reject introduction attempts
using an unpublished IPT.)
I think there are too many possible IPTs
to maintain experience information about IPTs we used to use;
the list of experience information would grow to the size of the network.
@ -46,7 +51,7 @@ lead to distinguishability ?
* Attempt to establish and verify them, in parallel
* Wait again the time it took to establish and verify the first one
* Wait a short time
and then publish a short-lifetime descriptor listing the ones
set up so far (this gets us some working descriptors right away)
@ -54,7 +59,7 @@ lead to distinguishability ?
(This behaviour follows from the detailed algorithm below.)
## Verification and monitoring (optional)
## Verification and monitoring (optional, probably not in v1)
After ESTABLISH_INTRO,
we attempt (via a 2nd set of circuits)
@ -62,6 +67,9 @@ an INTRODUCE probe, to see if the IPT is working.
We do such probes periodically at random intervals.
NOTE: there is a behaviour/privacy risk here,
which should be properly considered before implementation.
## General operation, IPT selection
We maintain records of each still-possibly-relevant IPT.
@ -78,6 +86,9 @@ rapidly churning through IPT candidates.)
When we select a new IPT, we randomly choose a planned replacement time,
after which it becomes `Retiring`.
Additionally, any IPT becomes `Retiring`
after it has been used for a certain number of introductions
(c.f. C Tor `#define INTRO_POINT_MIN_LIFETIME_INTRODUCTIONS 16384`.)
## IPT states
@ -101,7 +112,9 @@ Each IPT can be in the following states:
(For example, the circuit to it has collapsed.)
We will continue to try to maintain our circuit to it.
But we won't publish it in any descriptor.
We will try to replace it with another IPT.
We will allow the re-establishment attempt to proceed,
but if it doesn't yield success within a reasonable time,
we will try to replace this IPT with another IPT.
* `Retiring`:
We have reached the IPT's planned replacement time.
@ -126,6 +139,11 @@ An IPT is removed from our records, and we give up on it,
when it is no longer `Good` or `Establishing`
and all descriptors that mentioned it have expired.
(Until all published descriptors mentioning an IPT expire,
we consider ourselves bound by those previously-published descriptors,
and try to maintain the IPT.
TODO: Allegedly this is unnecessary, but I don't see how it could be.)
When we lose our circuit to an IPT,
we look at the `ErrorKind` to try to determine
if the fault was local (and would therefore affect all IPTs):
@ -175,12 +193,13 @@ The possibilities are:
The idea of what to publish is calculated as follows:
* If we have at least N `Good` IPTs: `Certain`.
(We publish the "best" N IPTs for some definition of "best".
TODO: should we use the fault count? recency?)
* Unless we have at least one `Good` IPT: `Unknown`.
* Otherwise: if there are IPTs in `Establishing`,
and they have been in `Establishing` for less than
twice as long as the fastest-to-establish `Good` IPT:
and they have been in `Establishing` only a short time [1]:
`Unknown`; otherwise `Uncertain`.
The effect is that we delay publishing an initial descriptor
@ -195,10 +214,23 @@ is the minimum (30 minutes).
Otherwise, we double the lifetime each time,
unless any IPT in the previous descriptor was declared `Faulty`,
in which case we reset it back to the minimum.
TODO: Perhaps we should just pick fixed short and long lifetimes instead,
to limit distinguishability.
(Rationale: if IPTs are regularly misbehaving,
we should be cautious and limit our exposure to the damage.)
[1] NOTE: We wait a "short time" between establishing our first IPT,
and publishing an incomplete (<N) descriptor -
this is a compromise between
availability (publishing as soon as we have any working IPT)
and
exposure and hsdir load
(which would suggest publishing only when our IPT set is stable).
One possible strategy is to wait as long again
as the time it took to establish our first IPT.
Another is to somehow use our circuit timing estimator.
## Descriptor publication
The descriptor output from the IPT maintenance algorithm is
@ -222,7 +254,8 @@ We record for each hsdir what we have published.
We attempt publication in the following cases:
* `Certain`, if: the IPT list has changed from what was published
* `Certain`, if: the IPT list has changed from what was published,
and we haven't published a `Certain` set recently
* `Uncertain`, if: nothing is published,
or what is published will expire soon,
or we haven't published since Arti was restarted
@ -235,11 +268,18 @@ if and what we want to publish.
## Tuning parameters
TODO: Review these tuning parameters both for value and origin.
Some of these may be in `param-spec.txt` section "8. V3 onion service parameters"
Some of them may be in C Tor.
* N, number of IPTs to try to maintain:
configurable, default is 3, max is 20.
(rend-spec-v3 2.5.4 NUM_INTRO_POINT)
* Maximum number of IPTs including replaced faulty ones (2N).
* IPT replacement time: 4..7 days (uniform random)
TODO: what is the right value here? (Should we do time-based rotation at all?)
* "Soon" for "if the published descriptor will expire soon":
10 minutes.
@ -252,6 +292,7 @@ if and what we want to publish.
## Load balancing (and maybe failover)
This is a sketch, only.
TODO: Look at what Onion Balance does before implementing this.
If it's desired to allow multiple Arti processes to serve a single HS: