Commit Graph

37 Commits

Author SHA1 Message Date
Rusty Russell d31420211a connectd: add counters to each peer connection.
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:

1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.

This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 9b6c97437e connectd: remove reconnection logic.
We don't have to put aside a peer which is reconnecting and wait for
lightningd to remove the old peer, we can now simply free the old
and add the new.

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 8678c5efb3 connectd: release peer soon as lightingd tells us.
Now we have separate peer draining logic, we can simply use it when
connectd tells us to release the peer, without waiting.  (We could
simply free the peer, but that's a bit rude, as messages can get
lost).

This removes various complex flags and logic we had before.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.
2022-07-18 20:50:04 -05:00
Rusty Russell 9dc3880360 connectd: put peer into "draining" mode when we want to close it.
This removes it from the hashtable, and forces it to do nothing but
send out any remaining packets, then close.

It is, in effect, reduced to a stub, with no further interactions
with the rest of the system (all subds are freed already).

Also removes the need for an explicit "final_msg" too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 6fd8fa4d95 connectd: optimize requests for "recent" gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-15 21:18:29 +09:30
Rusty Russell 7dd8e27862 connectd: don't insist on ping replies when other traffic is flowing.
Got complaints about us hanging up on some nodes because they don't respond
to pings in a timely manner (e.g. ACINQ?), but that turned out to be something
else.

Nonetheless, we've had reports in the past of LND badly prioritizing gossip
traffic, and thus important messages can get queued behind gossip dumps!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: connectd: give busy peers more time to respond to pings.
2022-07-09 12:27:05 +09:30
Rusty Russell fd90e5746b connectd: don't keep around more than one old connection.
This was fixed in 1c495ca5a8 ("connectd:
fix accidental handling of old reconnections.") and then reverted by
the rework in "connectd: avoid use-after-free upon multiple
reconnections by a peer".

The latter made the race much less likely, since we cleaned up the
reconnecting struct once the connection was hung up by the remote
node, but it's still theoretically possible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:47:27 +09:30
Matt Whitlock 83c825945c connectd: avoid use-after-free upon multiple reconnections by a peer
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255
Fixes: #5282
Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
2022-06-28 13:47:27 +09:30
Rusty Russell 8b62e2584f connectd: remove enable-autotor-v2-mode option
Changelog-Removed: lightningd: removed `enable-autotor-v2-mode` option (deprecated v0.10.1)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-18 10:15:36 +09:30
Rusty Russell 1c495ca5a8 connectd: fix accidental handling of old reconnections.
We had multiple reports of channels being unilaterally closed because
it seemed like the peer was sending old revocation numbers.

Turns out, it was actually old reestablish messages!  When we have a
reconnection, we would put the new connection aside, and tell lightningd
to close the current connection: when it did, we would restart
processing of the initial reconnection.

However, we could end up with *multiple* "reconnecting" connections,
while waiting for an existing connection to close.  Though the
connections were long gone, there could still be messages queued
(particularly the channel_reestablish message, which comes early on).

Eventually, a normal reconnection would cause us to process one of
these reconnecting connections, and channeld would see the (perhaps
very old!) messages, and get confused.

(I have a test which triggers this, but it also hangs the connect
 command, due to other issues we will fix in the next release...)

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-16 09:59:42 +09:30
Rusty Russell 37e8d2fb0f connectd: disable advertizement of WEBSOCKET addresses.
This seems to prevent broad propagation, due to LND not allowing it.  See
	https://github.com/lightningnetwork/lnd/issues/6432

We still announce it if you disable deprecated-apis, so tests still work,
and hopefully we can enable it in future.

Fixes: #5196
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: disabled websocket announcement due to LND propagation issues
2022-04-21 06:13:55 +09:30
Rusty Russell 9bddfc2048 connectd: take dev-suppress-gossip from gossipd.
Gossipd didn't actually suppress all gossip, resulting in a flake!
Doing it in connectd now makes much more sense.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-31 19:38:05 +10:30
Rusty Russell ea7120a313 lightningd: add --dev-no-ping-timer to avoid ping response timeouts.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-31 13:40:27 +10:30
Rusty Russell 2424b7dea8 connectd: hold peer until we're interested.
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.

We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell 16e9ba0361 connectd: fix confusing names.
The message from lightningd simply acknowleges that we are allowed to
discard the peer (because no subdaemons are talking to it anymore).
This difference becomes more stark once connectd holds on to idle
peers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell fcd0b2eb42 connectd: prepare for multiple subd connections.
We still always have 1, but the infrastructure is now in place.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell c075d78431 connectd: use listen_fd array directly, rather than returning binding arr.
We always added to both arrays, might as well just keep one.  

We make mayfail an explicit flag, rather than relying on the presence
of errstr, which is never NULL now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-05 15:48:03 +10:30
Michael Schmoock 38e2abf68a peer_exchange: set, read and log remote_addr
Changelog-Added: Protocol: set remote_addr on init tlvs
2022-02-22 05:45:47 +10:30
Rusty Russell 3121cebf4c gossipd: don't hand out fds.
Gossipd now simply gets told by channeld when peers arrive or leave.
(it only needs to know for the seeker).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 3c5d27e3e9 subdaemons: remove gossipd fd from per-peer daemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 50eccb6a12 connectd: handle pings and pongs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `ping` now works with connected peers, even without a channel.
2022-02-08 11:15:52 +10:30
Rusty Russell bba468a51c connectd: temporarily have two fds to gossipd.
We want to stream gossip through this, but currently connectd treats the
fd as synchronous.  While we work on getting rid of that, it's easiest to
have two fds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell d29795a198 connectd: don't just close to peer, but use shutdown().
We would lose packets sometimes due to this previously, but it
doesn't happen over localhost so our tests didn't notice.  However,
now we have connectd being sole thing talking to peers, we can do
a more elegant shutdown, which should fix closing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Always flush sockets to increase chance that final message get to peer (esp. error packets).
2022-01-20 15:24:06 +10:30
Rusty Russell a93c49ca65 connectd: implement @ correctly.
dev_blackhole_fd was a hack, and doesn't work well now we are async
(it worked for sync comms in per-peer daemons, but now we could sneak
through a read before we get to the next write).

So, make explicit flags and use them.  This is much easier now we
have all peer comms in one place.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell 26b9384fd0 various: minor cleanups from Christian's review.
More significant things have been folded.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell 6d4c56e8b6 connectd: put more stuff into struct gossip_state.
We're the only ones who use it now, so put our fields inside it and
make it local.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell 029d65cf2e connectd: serve gossip_store file for the peer.
We actually intercept the gossip_timestamp_filter, so the gossip_store
mechanism inside the per-peer daemon never kicks off for normal connections.

The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but
that only effects one place.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell 6dae0118f9 connectd: clearly differentiate incoming and outgoing paths.
This should make it clearer where the problem seen in
https://github.com/ElementsProject/lightning/issues/4297 is.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-03-25 11:31:58 +10:30
Saibato 2b0aba13a8 connectd/connectd: Display an error hint for V3 tor onions when connect fail.
@thestick613 noticed that since tor version below 0.3.2.2-alpha
will not support V3 ed25519 address formats, the error handling
is not that helpful in the error message from cli.
So now we add an hint.

Changelog-None:

Signed-off-by: Saibato <saibato.naga@pm.me>

connectd/connectd.h; Add helper function to update conn error list

Signed-off-by: Saibato <saibato.naga@pm.me>
2020-07-01 11:21:58 +02:00
Rusty Russell 2f1502abf4 cleanup: make 'u8 *features' and 'struct feature_set *fset' more explicit.
It's almost always "their_features" and "our_features" respectively, so
make those names clear.

Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-04-03 13:13:21 +10:30
Rusty Russell 15f54878e4 connectd: do feature bits check after init exchange.
This will help with the next patch, where we wean off using a global
for features: connectd.c has access to the feature bits.

Since connectd might now want to send a message, it needs the crypto_state
non-const, which makes this less trivial than it would otherwise be.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-04-03 13:13:21 +10:30
Rusty Russell bd55f6d940
common/features: only support a single feature bitset.
This is mainly an internal-only change, especially since we don't
offer any globalfeatures.

However, LND (as of next release) will offer global features, and also
expect option_static_remotekey to be a *global* feature.  So we send
our (merged) feature bitset as both global and local in init, and fold
those bitsets together when we get an init msg.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-10-11 02:52:04 +00:00
Rusty Russell a40f45af55 connectd: generate message for lightningd inside peer_connected().
We used to generate this in the caller, then save it in case we needed
to retry.  We're about to change the message we send to lightningd, so
we'll need to regenerate it every time; just hand all the extra args
into peer_connected() and we can generate the `connect_peer_connected`
msg there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

1diff --git a/connectd/connectd.c b/connectd/connectd.c
index 94fe50b56..459c9ac63 100644
2019-06-04 01:29:39 +00:00
Rusty Russell a2fa699e0e Use node_id everywhere for nodes.
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.

We didn't get much space saving in gossipd, even though we should save
24 bytes per node.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-09 12:37:16 -07:00
Rusty Russell 4de2b362f5 connectd: rename 'struct reaching' to 'struct connecting'.
It reads better, and it's accurate: it only exists while we're trying to
connect to a peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 04:14:28 +00:00
Rusty Russell a1bdaa8f99 connectd/peer_exchange_initmsg: handle peer comms ourselves.
connectd is the only user of the cryptomsg async APIs; better to
open-code it here.  We need to expose a little from cryptomsg(),
but we remove the 'struct peer' entirely from connectd.

One trick is that we still need to defer telling lightningd when a
peer reconnects (until it tells us the old one is disconnected).  So
now we generate the message for lightningd and send it once we're woken.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 04:14:28 +00:00
Rusty Russell 0d46a3d6b0 Put the 'd' back in the daemons.
@renepickhardt: why is it actually lightningd.c with a d but hsm.c without d ?

And delete unused gossipd/gossip.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-03 05:01:40 +00:00