Merge branch 'ipts' into 'main'

dev notes: Draft IPT algorithm: note re intro pt verification (followup) See merge request tpo/core/arti!1438
2023-07-26 15:14:18 +00:00 · 2023-07-26 15:14:18 +00:00 · 9966627479
parent 30e25af0a0 869df2817a
commit 9966627479
1 changed files with 47 additions and 6 deletions
--- a/doc/dev/notes/hssvc-ipt-algorithms.md
+++ b/doc/dev/notes/hssvc-ipt-algorithms.md
@ -33,6 +33,11 @@ Stream of introduction requests,
 done by passing an mpsc sender into the IPT Manager's constructor,
 which is simply cloned and given to each IPT Establisher.

+(Each IPT Establisher is told by the IPT Manager
+when a descriptor mentioning that IPT is about to be published,
+so that the IPT Establisher can reject introduction attempts
+using an unpublished IPT.)
+
 I think there are too many possible IPTs
 to maintain experience information about IPTs we used to use;
 the list of experience information would grow to the size of the network.
@ -46,7 +51,7 @@ lead to distinguishability ?

 * Attempt to establish and verify them, in parallel

- * Wait again the time it took to establish and verify the first one
+ * Wait a short time
   and then publish a short-lifetime descriptor listing the ones
   set up so far (this gets us some working descriptors right away)

@ -54,7 +59,7 @@ lead to distinguishability ?

 (This behaviour follows from the detailed algorithm below.)

-## Verification and monitoring (optional)
+## Verification and monitoring (optional, probably not in v1)

 After ESTABLISH_INTRO,
 we attempt (via a 2nd set of circuits)
@ -62,6 +67,9 @@ an INTRODUCE probe, to see if the IPT is working.

 We do such probes periodically at random intervals.

+NOTE: there is a behaviour/privacy risk here,
+which should be properly considered before implementation.
+
 ## General operation, IPT selection

 We maintain records of each still-possibly-relevant IPT.
@ -78,6 +86,9 @@ rapidly churning through IPT candidates.)

 When we select a new IPT, we randomly choose a planned replacement time,
 after which it becomes `Retiring`.
+Additionally, any IPT becomes `Retiring`
+after it has been used for a certain number of introductions
+(c.f. C Tor `#define INTRO_POINT_MIN_LIFETIME_INTRODUCTIONS 16384`.)

 ## IPT states

@ -101,7 +112,9 @@ Each IPT can be in the following states:
   (For example, the circuit to it has collapsed.)
   We will continue to try to maintain our circuit to it.
   But we won't publish it in any descriptor.
-   We will try to replace it with another IPT.
+   We will allow the re-establishment attempt to proceed,
+   but if it doesn't yield success within a reasonable time,
+   we will try to replace this IPT with another IPT.

 * `Retiring`:
   We have reached the IPT's planned replacement time.
@ -126,6 +139,11 @@ An IPT is removed from our records, and we give up on it,
 when it is no longer `Good` or `Establishing`
 and all descriptors that mentioned it have expired.

+(Until all published descriptors mentioning an IPT expire,
+we consider ourselves bound by those previously-published descriptors,
+and try to maintain the IPT.
+TODO: Allegedly this is unnecessary, but I don't see how it could be.)
+
 When we lose our circuit to an IPT,
 we look at the `ErrorKind` to try to determine
 if the fault was local (and would therefore affect all IPTs):
@ -175,12 +193,13 @@ The possibilities are:
 The idea of what to publish is calculated as follows:

 * If we have at least N `Good` IPTs: `Certain`.
+   (We publish the "best" N IPTs for some definition of "best".
+   TODO: should we use the fault count?  recency?)

 * Unless we have at least one `Good` IPT: `Unknown`.

 * Otherwise: if there are IPTs in `Establishing`,
-   and they have been in `Establishing` for less than
-   twice as long as the fastest-to-establish `Good` IPT:
+   and they have been in `Establishing` only a short time [1]:
   `Unknown`; otherwise `Uncertain`.

 The effect is that we delay publishing an initial descriptor
@ -195,10 +214,23 @@ is the minimum (30 minutes).
 Otherwise, we double the lifetime each time,
 unless any IPT in the previous descriptor was declared `Faulty`,
 in which case we reset it back to the minimum.
+TODO: Perhaps we should just pick fixed short and long lifetimes instead,
+to limit distinguishability.

 (Rationale: if IPTs are regularly misbehaving,
 we should be cautious and limit our exposure to the damage.)

+[1] NOTE: We wait a "short time" between establishing our first IPT,
+and publishing an incomplete (<N) descriptor -
+this is a compromise between
+availability (publishing as soon as we have any working IPT)
+and
+exposure and hsdir load
+(which would suggest publishing only when our IPT set is stable).
+One possible strategy is to wait as long again
+as the time it took to establish our first IPT.
+Another is to somehow use our circuit timing estimator.
+
 ## Descriptor publication

 The descriptor output from the IPT maintenance algorithm is
@ -222,7 +254,8 @@ We record for each hsdir what we have published.

 We attempt publication in the following cases:

- * `Certain`, if: the IPT list has changed from what was published
+ * `Certain`, if: the IPT list has changed from what was published,
+   and we haven't published a `Certain` set recently
 * `Uncertain`, if: nothing is published,
   or what is published will expire soon,
   or we haven't published since Arti was restarted
@ -235,11 +268,18 @@ if and what we want to publish.

 ## Tuning parameters

+TODO: Review these tuning parameters both for value and origin.
+Some of these may be in `param-spec.txt` section "8. V3 onion service parameters"
+Some of them may be in C Tor.
+
 * N, number of IPTs to try to maintain:
   configurable, default is 3, max is 20.
   (rend-spec-v3 2.5.4 NUM_INTRO_POINT)

+ * Maximum number of IPTs including replaced faulty ones  (2N).
+
 * IPT replacement time: 4..7 days (uniform random)
+   TODO: what is the right value here?  (Should we do time-based rotation at all?)

 * "Soon" for "if the published descriptor will expire soon":
   10 minutes.
@ -252,6 +292,7 @@ if and what we want to publish.
 ## Load balancing (and maybe failover)

 This is a sketch, only.
+TODO: Look at what Onion Balance does before implementing this.

 If it's desired to allow multiple Arti processes to serve a single HS: