Commit Graph

142 Commits

Author SHA1 Message Date
Rusty Russell cf80f0520a connectd: dev-report-fds to do file descriptor audit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
adi2011 709ff01fd2 connectd: make exception for peer storage msgs. 2023-02-08 08:37:59 -06:00
Rusty Russell d7bcac2ae7 lightningd: allow sendcustommsg even if plugins are still processing peer_connected.
This is needed for the next patch, which does this from the peer_connected hook!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `sendcustommsg` can now be called by a plugin from within the `peer_connected` hook.
2023-02-08 08:37:59 -06:00
Rusty Russell 456078150a lightningd: tell connectd we're shutting down.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
Rusty Russell 2209d0149f connectd: add new start_shutdown message.
We stop listening, and also refuse to send "connectd_peer_spoke" to create
new subdaemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
Rusty Russell 6347ee7308 lightningd: don't run more than one reconnect timer at once.
In various circumstances we can start a reconnection while one is
already going on.  These can stockpile if the node really is unreachable.

Reported-by: @whitslack
Fixes: #5654
Changelog-Fixed: lightningd: we no longer stack multiple reconnection attempts if connections fail.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-03 09:18:59 +10:30
Rusty Russell 22eac96750 connectd: don't ask DNS seeds for addresses on every reconnect.
We were stressing the servers if node cannot be found.  Only do lookup
on manual connect commands.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: lightningd: Only use DNS server address lookup on manual `connect` commands, not normal reconnection attempts.
2023-01-03 15:00:27 +10:30
Rusty Russell 2da5244e83 jsonrpc: make error codes an enum.
This allows GDB to print values, but also allows us to use them in
'case' statements.  This wasn't allowed before because they're not
constant terms.

This also made it clear there's a clash between two error codes,
so move one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: Error code from bcli plugin changed from 400 to 500.
2022-09-19 10:18:55 +09:30
Rusty Russell e8ef42b741 plugin: wire JSON id for commands which caused hooks to fire.
Most obvious one is the "connect" hook.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-16 12:31:45 +09:30
Rusty Russell 6c33f7db65 common: remove unused parameter "allow_deprecated" from parse_wireaddr_internal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-12 09:34:52 +09:30
Rusty Russell 22ff007d64 connectd: control connect backoff from lightningd.
We used to tell connectd to remember our connect delay, and hand it
back (increased if necessary).

Instead, simply record when we last tried to connect.  If it was less
than 10 minutes ago, double delay (up to 5 minutes max), otherwise
reset delay to 1 second.

This covers all scenarios: whether we reconnect then immediately
disconnect, or never successfully connect, it doesn't matter.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #5453
2022-07-28 15:08:44 +09:30
niftynei 7bbfef5054 tests: flake fix; l1 was waiting too long to reconnect
We were waiting too long for the reconnect to happen (60s default),
which caused this test to timeout.

When testing, let's speed up the reconnect.

L2 tried to reconnect but didn't have connection information in its
gossip -- is there a way to ask/save connection data from a node you're
making a channel with that doesn't rely on their node_announcement?
2022-07-25 16:28:09 +09:30
Rusty Russell a08728497b lightningd: reintroduce "slow connect" logic.
Just keep a flag on the peer, and delay connection longer if that is set.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell a3c4908f4a lightningd: don't explicitly tell connectd to disconnect, have it do it on sending error/warning.
Connectd already does this when we *receive* an error or warning, but
now do it on send.  This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).

When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc.  That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell e15e55190b lightningd: provide peer address for reconnect if connect fails.
It usually works out due to other reconnections, but I noticed this
diagnosing another test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell d31420211a connectd: add counters to each peer connection.
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:

1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.

This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 41b379ed89 lightningd: hand fds to connectd, not receive them from connectd.
Before this patch:
1. connectd says it's connected (peer_connected)
2. we tell connectd we want to talk about each channel (peer_make_active)
3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active)
4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active)

Now:
1. connectd says it's connected (peer_connected)
2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd).
3. if connectd says it sent something about a channel we didn't tell it about, we either tell
   it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd).

This is the minimal-size patch, which is why we create socket pairs in
so many places to use the existing functions.  Many cleanups are
possible, since the new flow is so simple.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell eff53495db lightningd: make "is peer connected" a tristate.
First, connectd tells us the peer has connected, and we call the connected hook,
and if it says it's fine, we are actually connected and we fire off notifications.

Of course, we could be disconnected while in the connected hook, and that would
mean we tell people about a connection which is no longer current.

Make this clear with a tristate: if we're not marked disconnected by
the time the hooks finish, we're good.  It also gives us a cleaner
"connect" command return when we connected but disconnected before
processing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 40145e619b connectd: remove the redundant "already connected" logic.
It should now be reliable, so we don't need this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 9b6c97437e connectd: remove reconnection logic.
We don't have to put aside a peer which is reconnecting and wait for
lightningd to remove the old peer, we can now simply free the old
and add the new.

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell d58e6fa20b lightningd: don't tell connectd to disconnect peer if it told us.
We allow connectd to tell us a peer has gone away, but now we need
to make sure we don't double-spiderman and tell it to disconnect peer.

This is particularly harmful on reconnect: it (will soon) tell us the
old connection is gone, ready to tell us the new peer has connected.
We would tell it to disconnect the peer, which throws away the new
connection!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell c64ce4bbf3 lightningd: clean up channels when connectd says peer is gone.
This is redundant now, since connectd only sends us this once we tell
it it's OK, but that's changing, so clean up now.  This means that
connectd will be able to make *unsolicited* closes, if it needs to.

We share logic with peer_please_disconnect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 912ac25270 lightningd: remove 'connected' flag from channel structure.
It's directly a product of "does it have a current owner subdaemon"
and "does that subdaemon talk to peers", so create a helper function
which just evaluates that instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 401f1debc5 common: clean up json routine locations.
We have them split over common/param.c, common/json.c,
common/json_helpers.c, common/json_tok.c and common/json_stream.c.

Change that to:
* common/json_parse (all the json_to_xxx routines)
* common/json_parse_simple (simplest the json parsing routines, for cli too)
* common/json_stream (all the json_add_xxx routines)
* common/json_param (all the param and param_xxx routines)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-15 12:24:00 -05:00
Rusty Russell ec76ba3895 lightningd/connect_control: remove param_tok from connect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-15 12:24:00 -05:00
Vincenzo Palazzo 7ff62b4a00 lightnind: remove`DEFAULT_PORT` global definition
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Vincenzo Palazzo cc7a405ca4 lightningd: use the standard port derivation in connect command
Complete implementation of BOLT1 port derivation proposal https://github.com/lightning/bolts/pull/968

Changelog-Added: rpc: use the standard port derivation in connect command when the port is not specified.

Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Rusty Russell 3f98cf3fce lightningd: track weird CI crash in test_important_plugin
Looks like we woke one of the startup io_loops early, and thus
we thought we'd finished connectd_activate and we hadn't.  This
caused us to use an uninitialized ld->announceable array, and
finally caused an assert fail in the main loop.

Make *every* loop assert that it was exited for the correct reason,
so if it happens again, we can maybe figure out what part of
the code to look at.

```
lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
lightningd: FATAL SIGNAL 6 (version 4df66fa)
...
------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.895509
==895509== Conditional jump or move depends on uninitialised value(s)
==895509==    at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
==895509==    by 0x22D531: tal_bytelen (tal.c:637)
==895509==    by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
==895509==    by 0x13AC6E: gossip_init (gossip_control.c:254)
==895509==    by 0x1497EC: main (lightningd.c:1090)
==895509== 
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Rusty Russell 8b62e2584f connectd: remove enable-autotor-v2-mode option
Changelog-Removed: lightningd: removed `enable-autotor-v2-mode` option (deprecated v0.10.1)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-18 10:15:36 +09:30
Rusty Russell 37e8d2fb0f connectd: disable advertizement of WEBSOCKET addresses.
This seems to prevent broad propagation, due to LND not allowing it.  See
	https://github.com/lightningnetwork/lnd/issues/6432

We still announce it if you disable deprecated-apis, so tests still work,
and hopefully we can enable it in future.

Fixes: #5196
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: disabled websocket announcement due to LND propagation issues
2022-04-21 06:13:55 +09:30
Rusty Russell 9bddfc2048 connectd: take dev-suppress-gossip from gossipd.
Gossipd didn't actually suppress all gossip, resulting in a flake!
Doing it in connectd now makes much more sense.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-31 19:38:05 +10:30
Rusty Russell ea7120a313 lightningd: add --dev-no-ping-timer to avoid ping response timeouts.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-31 13:40:27 +10:30
Rusty Russell 4e8239fcfe lightningd: don't tell connectd to discard peer unless no subds left.
Otherwise it waits for subds to exit, but they don't.  Plus, the others
may still be talking!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell 21e1d68e3b lightningd: remove (most) functions to search channels by status.
This is generally verboten now, since there can be multiple.  There are a
few exceptions:

1. We sometimes want to know if there are *any* active channels.
2. Some dev commands still take peer id when they mean channel_id.
3. We still allow peer id when it's fully determined.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: `close` by peer id will fail if there is more than one live channel (use `channel_id` or `short_channel_id` as id arg).
2022-03-23 13:20:12 +10:30
Rusty Russell b3438e9bba lightningd: associate connect commands with peer, not channel.
Sure, we want to connect (usually) because of an active channel, but
it's not specific to the channel itself.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell 2424b7dea8 connectd: hold peer until we're interested.
Either because lightningd tells us it wants to talk, or because the peer
says something about a channel.

We also introduce a behavior change: we disconnect after a failed open.
We might want to modify this later, but we it's a side-effect of openingd
not holding onto idle connections.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell deecedb033 connectd: tell lightningd when disconnect is complete.
This avoids races in our tests where we assume it's sync (and is kind
of nicer).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell 6cc9f37cab connectd: handle connect vs closing race better.
We would return success from connect even though the peer was closing;
this is technically correct but fairly undesirable.  Better is to pass
every connect attempt to connectd, and have it block if the peer is
exiting (and retry), otherwise tell us it's already connected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell 16e9ba0361 connectd: fix confusing names.
The message from lightningd simply acknowleges that we are allowed to
discard the peer (because no subdaemons are talking to it anymore).
This difference becomes more stark once connectd holds on to idle
peers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Rusty Russell b99c04e605 lightningd: add explicit "connected" flag.
We currently intuit this by whether there's a subdaemon owning it.
But we're about to change the rules and allow connectd to hold idle
connections, so we need an explicit flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-23 13:20:12 +10:30
Michael Schmoock e92176248e chore: fix typo announcable -> announceable
"announcable" is a common misspelling of "announceable", see:

https://en.wiktionary.org/wiki/announcable
2022-03-11 16:42:45 +10:30
Rusty Russell b5a1715c2b connectd: also fail without a scary backtrace when listen fails.
For example, if you do:

```
./lightningd/lightningd --network=regtest --experimental-websocket-port=19846
```

Then you're trying to reuse the normal port as the websocket port, but this
only fails at *listen* time, when we activate connectd.  Catch this too.

Fixes incorrect fatal() message, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-05 15:48:03 +10:30
Rusty Russell f1ed373c97 connectd: be more graceful when we an address is in use.
Aditya had this issue due to a config line, and the result was hard
to diagnose even for me.

It's now:

```
$ ./lightningd/lightningd --network=regtest --addr=:18444
2022-02-26T05:01:28.705Z **BROKEN** connectd: Failed to bind socket for 0.0.0.0:18444: Address already in use
```

Whereas before it doesn't even give the address it's trying to bind:

```
rusty@rusty-XPS-13-9370:~/devel/cvs/lightning (master)$ ./lightningd/lightningd --network=regtest --addr=:18444
lightning_connectd: Failed to bind on 2 socket: Address already in use (version v0.10.2-331-g86b83e4)
0x558a8b8d9a12 send_backtrace
	common/daemon.c:33
0x558a8b8e91e1 status_failed
	common/status.c:221
0x558a8b8c8e4f make_listen_fd
	connectd/connectd.c:1090
0x558a8b8c8f55 handle_wireaddr_listen
	connectd/connectd.c:1129
0x558a8b8c993d setup_listeners
	connectd/connectd.c:1312
0x558a8b8ca344 connect_init
	connectd/connectd.c:1517
0x558a8b8cbb57 recv_req
	connectd/connectd.c:1896
0x558a8b8d9f9f handle_read
	common/daemon_conn.c:31
0x558a8b9247c1 next_plan
	ccan/ccan/io/io.c:59
0x558a8b9253c9 do_plan
	ccan/ccan/io/io.c:407
0x558a8b92540b io_ready
	ccan/ccan/io/io.c:417
0x558a8b9276fe io_loop
	ccan/ccan/io/poll.c:453
0x558a8b8cbf36 main
	connectd/connectd.c:2033
0x7fe4d02940b2 ???
	???:0
0x558a8b8c285d ???
	???:0
0xffffffffffffffff ???
	???:0
2022-02-26T05:02:27.547Z **BROKEN** connectd: Failed to bind on 2 socket: Address already in use (version v0.10.2-331-g86b83e4)
2022-02-26T05:02:27.547Z **BROKEN** connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x558a8b8d9a68
2022-02-26T05:02:27.547Z **BROKEN** connectd: backtrace: common/status.c:221 (status_failed) 0x558a8b8e91e1
2022-02-26T05:02:27.547Z **BROKEN** connectd: backtrace: connectd/connectd.c:1090 (make_listen_fd) 0x558a8b8c8e4f
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: connectd/connectd.c:1129 (handle_wireaddr_listen) 0x558a8b8c8f55
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: connectd/connectd.c:1312 (setup_listeners) 0x558a8b8c993d
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: connectd/connectd.c:1517 (connect_init) 0x558a8b8ca344
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: connectd/connectd.c:1896 (recv_req) 0x558a8b8cbb57
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: common/daemon_conn.c:31 (handle_read) 0x558a8b8d9f9f
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x558a8b9247c1
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x558a8b9253c9
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x558a8b92540b
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x558a8b9276fe
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: connectd/connectd.c:2033 (main) 0x558a8b8cbf36
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0x7fe4d02940b2
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0x558a8b8c285d
2022-02-26T05:02:27.548Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff
2022-02-26T05:02:27.548Z **BROKEN** connectd: STATUS_FAIL_INTERNAL_ERROR: Failed to bind on 2 socket: Address already in use
lightningd: connectd failed (exit status 242), exiting.
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-05 15:48:03 +10:30
Rusty Russell ca08f27d54 connectd: remove second gossip fd.
Now we only send and receive gossip messages on this fd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 3c5d27e3e9 subdaemons: remove gossipd fd from per-peer daemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 1c71c9849b connectd: handle custom messages.
This is neater than what we had before, and slightly more general.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.
2022-02-08 11:15:52 +10:30
Rusty Russell 8782d39476 connectd: handle onion messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 50eccb6a12 connectd: handle pings and pongs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON_RPC: `ping` now works with connected peers, even without a channel.
2022-02-08 11:15:52 +10:30
Rusty Russell bba468a51c connectd: temporarily have two fds to gossipd.
We want to stream gossip through this, but currently connectd treats the
fd as synchronous.  While we work on getting rid of that, it's easiest to
have two fds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 39c93ee6e5 connectd: get addresses from lightningd, not gossipd.
It's weird to have connectd ask gossipd, when lightningd can just do it
and hand all the addresses together.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30