Commit Graph

205 Commits

Author SHA1 Message Date
Rusty Russell 798cf27cb4 connectd: give subds a chance to drain when lightningd says to disconnect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-10-23 15:48:50 +10:30
Dusty Daemon e1ac2410b0 connectd: Ignore sodium fd for Mac tests
On Mac most tests report BROKEN because sodium creating an untracked fd pointing to /dev/random. dev_report_fd’s finds it at tear down and reports a BROKEN message.

We allow a single “char special” fd without reporting it as broken improving QOL for Mac developers.

While we’re here we added the fd mode to the log to help with future rogue fd issues.

ChangeLog-None
2023-10-19 14:31:25 +10:30
Rusty Russell e11b35cb3a common/memleak: implement callback arg for dump_memleak.
This makes it easier to use outside simple subds, and now lightningd can
simply dump to log rather than returning JSON.

JSON formatting was a lot of work, and we only did it for lightningd, not for
subdaemons.  Easier to use the logs in all cases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-10-03 10:05:55 +02:00
Rusty Russell 0ff91e65dc connectd: remove #if DEVELOPER
We still refuse to run dev commands if lightningd sends it to us
despite us not being in developer mode, but that's mainly paranoia.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-09-21 20:08:24 +09:30
Rusty Russell a9f26b7d07 common/daemon.c: remove #ifdef DEVELOPER in favor of runtime flag.
Also requires us to expose memleak when !DEVELOPER, however we only
ever used the memleak tracking when the LIGHTNINGD_DEV_MEMLEAK
environment variable was set, so keep that.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-09-21 20:08:24 +09:30
Rusty Russell 9bc1a020d0 connectd: don't try to keep going if gossipd dies.
We will access the freed connection to gossipd.  This is weird to track
down when the *actual* issue is that gossipd died!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-07-09 16:49:48 +09:30
Rusty Russell a6772e9dec common: add new internal type for websockets.
Now it's not a public type, we need a way to refer to it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell 3f35d48fe4 common: remove websocket type from wireaddr.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell e66cf46a71 connectd: don't advertise websocket addresses.
I never really liked this hack: websockets are useful, advertizing
them not so much.

Note that we never actually documented that we would advertize these!

Changelog-EXPERIMENTAL: Protocol: Removed support for advertizing websocket addresses in gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell ddb79162ab connectd: clean up add_gossip loops.
This contained cut & paste code, and it wasn't clear to me that
the first loop included DNS entries with IPv6 entries.

Instead, allow the iterator to take multiple types, and use
a switch statement so compile will break as new types are added.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-06-01 09:28:39 +09:30
Rusty Russell cf80f0520a connectd: dev-report-fds to do file descriptor audit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell 3e49cb01bd connectd: don't leak fds if we have both IPv4 and IPv6.
We accept that we will fail to listen if we bind both IPv6 and IPv4 to
the same socket on a dual-stack machine (e.g. normal Linux), but we weren't
closing the fd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell ed58c24bc7 connectd: log broken if TCP_CORK fails.
But not if we're a developer using dev_disconnect, which substitutes the fd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell 295557ac50 connectd: don't try to set TCP_CORK on websocket pipe.
Most of this is piping the flag through so we know it's a websocket!

Reported-by: @ShahanaFarooqui
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-04-10 09:41:56 +09:30
Rusty Russell b5c614069b connectd: fix crash on freed context for new connections.
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: #6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-03-06 17:14:22 -06:00
Rusty Russell 2209d0149f connectd: add new start_shutdown message.
We stop listening, and also refuse to send "connectd_peer_spoke" to create
new subdaemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
Rusty Russell 05ac74fc44 connectd: keep array of our listening sockets.
This allows us to free them if we want to stop listening.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
niftynei 0b8ea2299a connectd: patch valgrind error w/ buffers for error msgs
The `tmpctx` is free'd before the error is read out/sent over the wire;
there's a call that will copy the array before sending it, let's use
that instead and take() the object?

------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2181501
==2181501== Syscall param write(buf) points to unaddressable byte(s)
==2181501==    at 0x49E4077: write (write.c:26)
==2181501==    by 0x1C79A3: do_write (io.c:189)
==2181501==    by 0x1C80AB: do_plan (io.c:394)
==2181501==    by 0x1C81BA: io_ready (io.c:423)
==2181501==    by 0x1CA45B: io_loop (poll.c:453)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==  Address 0x4afb158 is 40 bytes inside a block of size 140 free'd
==2181501==    at 0x483F0C3: free (vg_replace_malloc.c:872)
==2181501==    by 0x1D103C: del_tree (tal.c:421)
==2181501==    by 0x1D130A: tal_free (tal.c:486)
==2181501==    by 0x1364B8: clean_tmpctx (utils.c:172)
==2181501==    by 0x1266DD: daemon_poll (daemon.c:87)
==2181501==    by 0x1CA334: io_loop (poll.c:420)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==  Block was alloc'd at
==2181501==    at 0x483C855: malloc (vg_replace_malloc.c:381)
==2181501==    by 0x1D0AC5: allocate (tal.c:250)
==2181501==    by 0x1D1086: tal_alloc_ (tal.c:428)
==2181501==    by 0x1D124F: tal_alloc_arr_ (tal.c:471)
==2181501==    by 0x126204: cryptomsg_encrypt_msg (cryptomsg.c:161)
==2181501==    by 0x11335F: peer_connected (connectd.c:318)
==2181501==    by 0x118A8A: peer_init_received (peer_exchange_initmsg.c:135)
==2181501==    by 0x1C751E: next_plan (io.c:59)
==2181501==    by 0x1C8126: do_plan (io.c:407)
==2181501==    by 0x1C8168: io_ready (io.c:417)
==2181501==    by 0x1CA45B: io_loop (poll.c:453)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   write(buf)
   fun:write
   fun:do_write
   fun:do_plan
   fun:io_ready
   fun:io_loop
   fun:main
}
--------------------------------------------------------------------------------
2023-02-04 15:31:16 +10:30
Rusty Russell 81e57dce52 connectd: ensure htables are always tal objects.
We want to change the htable allocator to use tal, which will need
this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-12 11:44:10 +10:30
Rusty Russell 22eac96750 connectd: don't ask DNS seeds for addresses on every reconnect.
We were stressing the servers if node cannot be found.  Only do lookup
on manual connect commands.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: lightningd: Only use DNS server address lookup on manual `connect` commands, not normal reconnection attempts.
2023-01-03 15:00:27 +10:30
Rusty Russell 701dd3dcef memleak: remove exclusions from memleak_start()
Add memleak_ignore_children() so callers can do exclusions themselves.

Having two exclusions was always such a hack!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-19 11:34:42 +09:30
Rusty Russell 3380f559f9 memleak: simplify API.
Mainly renaming.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-19 11:34:42 +09:30
Rusty Russell 2da5244e83 jsonrpc: make error codes an enum.
This allows GDB to print values, but also allows us to use them in
'case' statements.  This wasn't allowed before because they're not
constant terms.

This also made it clear there's a clash between two error codes,
so move one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: Error code from bcli plugin changed from 400 to 500.
2022-09-19 10:18:55 +09:30
Michael Schmoock e0d6f3ceb1 connectd: DNS Bolt7 #911 no longer EXPERIMENTAL
Changelog-Changed: Bolt7 #911 DNS annoucenent support is no longer EXPERIMENTAL
2022-09-13 06:42:20 +09:30
Rusty Russell 22ff007d64 connectd: control connect backoff from lightningd.
We used to tell connectd to remember our connect delay, and hand it
back (increased if necessary).

Instead, simply record when we last tried to connect.  If it was less
than 10 minutes ago, double delay (up to 5 minutes max), otherwise
reset delay to 1 second.

This covers all scenarios: whether we reconnect then immediately
disconnect, or never successfully connect, it doesn't matter.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #5453
2022-07-28 15:08:44 +09:30
Rusty Russell 9498e14530 connectd: two logging cleanups.
Don't log_io final messages twice (multiplex_final_message already does
this, so it's confusing to see us send e.g. WIRE_ERROR twice!).

And report that the peer has failed to connect out *before* telling
lightningd, otherwise we get a very confusing ordering, e.g.:

```
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Reconnecting in 4 seconds
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Will try reconnect in 4 seconds
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-connectd: Failed connected out:
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-25 15:16:58 -07:00
Rusty Russell a3c4908f4a lightningd: don't explicitly tell connectd to disconnect, have it do it on sending error/warning.
Connectd already does this when we *receive* an error or warning, but
now do it on send.  This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).

When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc.  That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 719d1384d1 connectd: give connections a chance to drain when lightningd says to disconnect, or peer disconnects.
We want to avoid lost messages in the common cases.

This generalizes our drain code, by giving the subds each 5 seconds to
close themselves, but continue to allow them to send us traffic (if
peer is still connected) and continue to send them traffic.

We continue to send traffic *out* to the peer (if it's still
connected), until all subds are gone.  We still have a 5 second timer
to close the connection to peer.

On reconnects, we don't do this "drain period" on reconnects: we kill
immediately.

We fix up one test which was looking for the "disconnect" message
explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell d31420211a connectd: add counters to each peer connection.
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:

1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.

This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 41b379ed89 lightningd: hand fds to connectd, not receive them from connectd.
Before this patch:
1. connectd says it's connected (peer_connected)
2. we tell connectd we want to talk about each channel (peer_make_active)
3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active)
4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active)

Now:
1. connectd says it's connected (peer_connected)
2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd).
3. if connectd says it sent something about a channel we didn't tell it about, we either tell
   it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd).

This is the minimal-size patch, which is why we create socket pairs in
so many places to use the existing functions.  Many cleanups are
possible, since the new flow is so simple.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 40145e619b connectd: remove the redundant "already connected" logic.
It should now be reliable, so we don't need this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 9b6c97437e connectd: remove reconnection logic.
We don't have to put aside a peer which is reconnecting and wait for
lightningd to remove the old peer, we can now simply free the old
and add the new.

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 8678c5efb3 connectd: release peer soon as lightingd tells us.
Now we have separate peer draining logic, we can simply use it when
connectd tells us to release the peer, without waiting.  (We could
simply free the peer, but that's a bit rude, as messages can get
lost).

This removes various complex flags and logic we had before.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.
2022-07-18 20:50:04 -05:00
Rusty Russell e856accb7d connectd: send cleanup messages however peer is freed.
This lets us tal_free() it wherever we want, rather than always
freeing via peer_discard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 9dc3880360 connectd: put peer into "draining" mode when we want to close it.
This removes it from the hashtable, and forces it to do nothing but
send out any remaining packets, then close.

It is, in effect, reduced to a stub, with no further interactions
with the rest of the system (all subds are freed already).

Also removes the need for an explicit "final_msg" too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 7dd8e27862 connectd: don't insist on ping replies when other traffic is flowing.
Got complaints about us hanging up on some nodes because they don't respond
to pings in a timely manner (e.g. ACINQ?), but that turned out to be something
else.

Nonetheless, we've had reports in the past of LND badly prioritizing gossip
traffic, and thus important messages can get queued behind gossip dumps!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: connectd: give busy peers more time to respond to pings.
2022-07-09 12:27:05 +09:30
Rusty Russell fd90e5746b connectd: don't keep around more than one old connection.
This was fixed in 1c495ca5a8 ("connectd:
fix accidental handling of old reconnections.") and then reverted by
the rework in "connectd: avoid use-after-free upon multiple
reconnections by a peer".

The latter made the race much less likely, since we cleaned up the
reconnecting struct once the connection was hung up by the remote
node, but it's still theoretically possible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:47:27 +09:30
Matt Whitlock 83c825945c connectd: avoid use-after-free upon multiple reconnections by a peer
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255
Fixes: #5282
Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
2022-06-28 13:47:27 +09:30
Rusty Russell 4ee55acc71 connectd: don't start connecting in parallel in peer_conn_closed.
The crash below from @zerofeerouting left me confused.  The invalid
value in fmt_wireaddr_internal is a telltale sign of use-after-free.

This backtrace shows us destroying the conn *twice*: what's happening?

Well, tal carefully protects against destroying twice: it's not that
unusual to free something in a destructor which has already been freed.
So this indicates that there are *two* io_conn hanging off one
struct connecting, which isn't supposed to happen!  We deliberately
call try_connect_one_addr() initially, then inside the io_conn destructor.

But due to races in connectd vs lightningd connection state, we added
a fix which allows a connect command to sit around while the peer is
cleaning up (6cc9f37cab) and get fired
off when it's done.

But what if, in the chaos, we are already connecting again?  Now we'll
end up with *two* connections.

Fortunately, we have a `conn` pointer inside struct connecting, which
(with a bit of additional care) we can ensure is only non-NULL while
we're actually trying to connect.  This lets us check that before
firing off a new connection attempt in peer_conn_closed.

```
lightning_connectd: FATAL SIGNAL 6 (version v0.11.2rc2-2-g8f7e939)
0x5614a4915ae8 send_backtrace
	common/daemon.c:33
0x5614a4915b72 crashdump
	common/daemon.c:46
0x7ffa14fcd72f ???
	???:0
0x7ffa14dc87bb ???
	???:0
0x7ffa14db3534 ???
	???:0
0x5614a491fc71 fmt_wireaddr_internal
	common/wireaddr.c:255
0x5614a491fc7a fmt_wireaddr_internal_
	common/wireaddr.c:257
0x5614a491ea6b type_to_string_
	common/type_to_string.c:32
0x5614a490beaa destroy_io_conn
	connectd/connectd.c:754
0x5614a494a2f1 destroy_conn
	ccan/ccan/io/poll.c:246
0x5614a494a313 destroy_conn_close_fd
	ccan/ccan/io/poll.c:252
0x5614a4953804 notify
	ccan/ccan/tal/tal.c:240
0x5614a49538d6 del_tree
	ccan/ccan/tal/tal.c:402
0x5614a4953928 del_tree
	ccan/ccan/tal/tal.c:412
0x5614a4953e07 tal_free
	ccan/ccan/tal/tal.c:486
0x5614a4908b7a try_connect_one_addr
	connectd/connectd.c:870
0x5614a490bef1 destroy_io_conn
	connectd/connectd.c:759
0x5614a494a2f1 destroy_conn
	ccan/ccan/io/poll.c:246
0x5614a494a313 destroy_conn_close_fd
	ccan/ccan/io/poll.c:252
0x5614a4953804 notify
	ccan/ccan/tal/tal.c:240
0x5614a49538d6 del_tree
	ccan/ccan/tal/tal.c:402
0x5614a4953e07 tal_free
	ccan/ccan/tal/tal.c:486
0x5614a4948f08 io_close
	ccan/ccan/io/io.c:450
0x5614a4948f59 do_plan
	ccan/ccan/io/io.c:401
0x5614a4948fe1 io_ready
	ccan/ccan/io/io.c:417
0x5614a494a8e6 io_loop
	ccan/ccan/io/poll.c:453
0x5614a490c12f main
	connectd/connectd.c:2164
0x7ffa14db509a ???
	???:0
0x5614a4904e99 ???
	???:0
0xffffffffffffffff ???
	???:0
```

Fixes: #5339
Changelog-Fixed: connectd: occasional crash when we reconnect to a peer quickly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:46:59 +09:30
Vincenzo Palazzo 7ff62b4a00 lightnind: remove`DEFAULT_PORT` global definition
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-06-28 06:09:01 +09:30
Rusty Russell a1b8b40d13 connectd: fix debug message on bind fail.
It doesn't get the right errno, and it says "create" not "bind".

```
2022-05-20T03:04:46.498Z DEBUG   connectd: Failed to create 2 socket: Success
2022-05-20T03:04:46.500Z DEBUG   connectd: REPLY WIRE_CONNECTD_INIT_REPLY with 0 fds
2022-05-20T03:04:46.501Z DEBUG   connectd: connectd_init_done
2022-05-20T03:04:46.503Z **BROKEN** connectd: Failed to bind socket for 127.0.0.1:37871: Address already in use
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-27 17:21:35 +09:30
Michael Schmoock 033ac323d1 connectd: prefer IPv6 when available
Changelog-Changed: connectd: prefer IPv6 connections when available.
2022-06-17 20:30:16 +09:30
Rusty Russell 7c8dc62035 channeld: take over gossip_rcvd_filter.c and is_msg_gossip_broadcast.
channeld is the only user of these functions, since it now streams
all gossip itself.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-17 14:14:02 +09:30
Rusty Russell 8b62e2584f connectd: remove enable-autotor-v2-mode option
Changelog-Removed: lightningd: removed `enable-autotor-v2-mode` option (deprecated v0.10.1)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-18 10:15:36 +09:30
Rusty Russell 4343f720be connectd: remove assert which can trigger.
I have a test which reproduces this, too, and it's been seen in the
wild.  It seems we can add a subd as we're closing, which causes
this assert to trigger.

Fixes: #5254
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-16 09:59:42 +09:30
Rusty Russell 1c495ca5a8 connectd: fix accidental handling of old reconnections.
We had multiple reports of channels being unilaterally closed because
it seemed like the peer was sending old revocation numbers.

Turns out, it was actually old reestablish messages!  When we have a
reconnection, we would put the new connection aside, and tell lightningd
to close the current connection: when it did, we would restart
processing of the initial reconnection.

However, we could end up with *multiple* "reconnecting" connections,
while waiting for an existing connection to close.  Though the
connections were long gone, there could still be messages queued
(particularly the channel_reestablish message, which comes early on).

Eventually, a normal reconnection would cause us to process one of
these reconnecting connections, and channeld would see the (perhaps
very old!) messages, and get confused.

(I have a test which triggers this, but it also hangs the connect
 command, due to other issues we will fix in the next release...)

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-16 09:59:42 +09:30
Rusty Russell 37e8d2fb0f connectd: disable advertizement of WEBSOCKET addresses.
This seems to prevent broad propagation, due to LND not allowing it.  See
	https://github.com/lightningnetwork/lnd/issues/6432

We still announce it if you disable deprecated-apis, so tests still work,
and hopefully we can enable it in future.

Fixes: #5196
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: disabled websocket announcement due to LND propagation issues
2022-04-21 06:13:55 +09:30
Rusty Russell c3a7499573 connectd: avoid use-after-free on reconnect with remote_addr.
I was seeing a strange crash:
	Connectd gave bad CONNECT_PEER_CONNECTED message

The message is indeed mangled, around the remote_addr!
A quick review of the code revealed that we were not making a copy
when it was a reconnect, and so the remote_addr pointer was pointing
to memory which was freed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-20 06:44:58 +09:30
Rusty Russell 2526e804f7 doc: big BOLT update to incorporate warnings language.
We do this (send warnings) in almost all cases anyway, so mainly this
is a textual update, but there are some changes:

1. Send ERROR not WARNING if they send a malformed commitment secret.
2. Send WARNING not ERROR if they get the shutdown_scriptpubkey wrong (vs upfront)
3. Send WARNING not ERROR if they send a bad shutdown_scriptpubkey (e.g. p2pkh in future)
4. Rename some vars 'err' to 'warn' to make it clear we send a warning.

This means test_option_upfront_shutdown_script can be made reliable, too,
and it now warns and doesn't automatically close channel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-02 09:40:18 +10:30
Rusty Russell 9bddfc2048 connectd: take dev-suppress-gossip from gossipd.
Gossipd didn't actually suppress all gossip, resulting in a flake!
Doing it in connectd now makes much more sense.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-03-31 19:38:05 +10:30