Commit Graph

377 Commits

Author SHA1 Message Date
Rusty Russell b5c614069b connectd: fix crash on freed context for new connections.
ccan/io stores the context pointer for io_new_conn, but we were using
`daemon->listeners` which we reallocate, so it can use a stale pointer.

```
0x3e1700 call_error
	ccan/ccan/tal/tal.c:93
0x3e1700 check_bounds
	ccan/ccan/tal/tal.c:165
0x3e1700 to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x3e1211 to_tal_hdr_or_null
	ccan/ccan/tal/tal.c:186
0x3e1211 tal_alloc_
	ccan/ccan/tal/tal.c:426
0x3db8f4 io_new_conn_
	ccan/ccan/io/io.c:91
0x3dd2e1 accept_conn
	ccan/ccan/io/poll.c:277
0x3dd2e1 io_loop
	ccan/ccan/io/poll.c:444
0x3419fa main
	connectd/connectd.c:2081
```

Fixes: #6060
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-03-06 17:14:22 -06:00
adi2011 709ff01fd2 connectd: make exception for peer storage msgs. 2023-02-08 08:37:59 -06:00
adi2011 5f481aaf96 wire: Add patch file for peer storage bkp
Add msg type peer_storage and your_peer_storage
2023-02-08 08:37:59 -06:00
Rusty Russell 2209d0149f connectd: add new start_shutdown message.
We stop listening, and also refuse to send "connectd_peer_spoke" to create
new subdaemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
Rusty Russell 05ac74fc44 connectd: keep array of our listening sockets.
This allows us to free them if we want to stop listening.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-02-05 20:40:47 +01:00
niftynei 0b8ea2299a connectd: patch valgrind error w/ buffers for error msgs
The `tmpctx` is free'd before the error is read out/sent over the wire;
there's a call that will copy the array before sending it, let's use
that instead and take() the object?

------------------------------- Valgrind errors --------------------------------
Valgrind error file: valgrind-errors.2181501
==2181501== Syscall param write(buf) points to unaddressable byte(s)
==2181501==    at 0x49E4077: write (write.c:26)
==2181501==    by 0x1C79A3: do_write (io.c:189)
==2181501==    by 0x1C80AB: do_plan (io.c:394)
==2181501==    by 0x1C81BA: io_ready (io.c:423)
==2181501==    by 0x1CA45B: io_loop (poll.c:453)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==  Address 0x4afb158 is 40 bytes inside a block of size 140 free'd
==2181501==    at 0x483F0C3: free (vg_replace_malloc.c:872)
==2181501==    by 0x1D103C: del_tree (tal.c:421)
==2181501==    by 0x1D130A: tal_free (tal.c:486)
==2181501==    by 0x1364B8: clean_tmpctx (utils.c:172)
==2181501==    by 0x1266DD: daemon_poll (daemon.c:87)
==2181501==    by 0x1CA334: io_loop (poll.c:420)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==  Block was alloc'd at
==2181501==    at 0x483C855: malloc (vg_replace_malloc.c:381)
==2181501==    by 0x1D0AC5: allocate (tal.c:250)
==2181501==    by 0x1D1086: tal_alloc_ (tal.c:428)
==2181501==    by 0x1D124F: tal_alloc_arr_ (tal.c:471)
==2181501==    by 0x126204: cryptomsg_encrypt_msg (cryptomsg.c:161)
==2181501==    by 0x11335F: peer_connected (connectd.c:318)
==2181501==    by 0x118A8A: peer_init_received (peer_exchange_initmsg.c:135)
==2181501==    by 0x1C751E: next_plan (io.c:59)
==2181501==    by 0x1C8126: do_plan (io.c:407)
==2181501==    by 0x1C8168: io_ready (io.c:417)
==2181501==    by 0x1CA45B: io_loop (poll.c:453)
==2181501==    by 0x118593: main (connectd.c:2053)
==2181501==
{
   <insert_a_suppression_name_here>
   Memcheck:Param
   write(buf)
   fun:write
   fun:do_write
   fun:do_plan
   fun:io_ready
   fun:io_loop
   fun:main
}
--------------------------------------------------------------------------------
2023-02-04 15:31:16 +10:30
Michael Schmoock ad249607d6 dual-fund: update extracted CSVs to latest bolt draft
Changelog-None
2023-02-04 15:31:16 +10:30
Rusty Russell 153b7bf192 common/gossip_store: move subdaemon-only routines to connectd.
connectd is the only one who uses these routines now.  The
rest can be linked into a plugin.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-30 15:15:41 -06:00
Rusty Russell 6a95d3a25e common: expose node_id_hash functions.
They're used in several places, and we're about to add more.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-21 08:05:31 -06:00
Rusty Russell 5dfcd15782 all: no longer need to call htable_clear to free htable contents.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-12 11:44:10 +10:30
Rusty Russell f07e37018d setup: make all htables use tal.
This makes them easier to clean up.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-12 11:44:10 +10:30
Rusty Russell 81e57dce52 connectd: ensure htables are always tal objects.
We want to change the htable allocator to use tal, which will need
this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2023-01-12 11:44:10 +10:30
Rusty Russell 22eac96750 connectd: don't ask DNS seeds for addresses on every reconnect.
We were stressing the servers if node cannot be found.  Only do lookup
on manual connect commands.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Protocol: lightningd: Only use DNS server address lookup on manual `connect` commands, not normal reconnection attempts.
2023-01-03 15:00:27 +10:30
Rusty Russell 15d0a8bec8 connectd: don't spam logs when we're under load.
This happens a lot with my node with rc2, so drop it to debug.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-11-30 19:31:38 +01:00
Rusty Russell 5becfa6ee1 onion_message: don't use general secret, use per-message secret.
We had a scheme where lightningd itself would put a per-node secret in
the blinded path, then we'd tell the caller when it was used.  Then it
simply checks the alias to determine if the correct path was used.

But this doesn't work when we start to offer multiple blinded paths.
So go for a far simpler scheme, where the secret is generated (and
stored) by the caller, and hand it back to them.

We keep the split "with secret" or "without secret" API, since I'm
sure callers who don't care about the secret won't check that it
doesn't exist!  And without that, someone can use a blinded path for a
different message and get a response which may reveal the node.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-11-09 15:08:03 +01:00
Rusty Russell 8720bbedae common/onion: split into decode and encode routines.
Some places (e.g. the pay plugin) only need to construct onions,
not decode them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-11-09 15:08:03 +01:00
Rusty Russell 159fc7d1a2 common/onion_message_parse: generic routine for parsing onion messages.
Instead of open coding in connectd/onion_message, we move it to common
with a nice API.

This lets us process the BOLT test vectors.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-10-26 11:29:06 +10:30
Rusty Russell 5cf86a1a2e common: update to latest onion message spec.
Mainly, field name changes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: Support for forwarding blinded payments (as per latest draft)
2022-10-26 11:29:06 +10:30
Rusty Russell 53e40c4380 common/blindedpath: generalize routines.
We're going to share them for onion messages as well as for blinded
payments.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-10-26 11:29:06 +10:30
Rusty Russell 41ef85318d onionmessages: remove obsolete onion message parsing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-29 16:10:57 +09:30
Rusty Russell 701dd3dcef memleak: remove exclusions from memleak_start()
Add memleak_ignore_children() so callers can do exclusions themselves.

Having two exclusions was always such a hack!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-19 11:34:42 +09:30
Rusty Russell 3380f559f9 memleak: simplify API.
Mainly renaming.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-09-19 11:34:42 +09:30
Rusty Russell 2da5244e83 jsonrpc: make error codes an enum.
This allows GDB to print values, but also allows us to use them in
'case' statements.  This wasn't allowed before because they're not
constant terms.

This also made it clear there's a clash between two error codes,
so move one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: JSON-RPC: Error code from bcli plugin changed from 400 to 500.
2022-09-19 10:18:55 +09:30
Michael Schmoock e0d6f3ceb1 connectd: DNS Bolt7 #911 no longer EXPERIMENTAL
Changelog-Changed: Bolt7 #911 DNS annoucenent support is no longer EXPERIMENTAL
2022-09-13 06:42:20 +09:30
Rusty Russell 1b30ea4b82 doc: update BOLTs to bc86304b4b0af5fd5ce9d24f74e2ebbceb7e2730
This contains the zeroconf stuff, with funding_locked renamed to
channel_ready.  I change that everywhere, and try to fix up the
comments.

Also the `alias` field is called `short_channel_id`.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: `funding_locked` is now called `channel_ready` as per latest BOLTs.
2022-09-12 09:34:52 +09:30
Rusty Russell 22ff007d64 connectd: control connect backoff from lightningd.
We used to tell connectd to remember our connect delay, and hand it
back (increased if necessary).

Instead, simply record when we last tried to connect.  If it was less
than 10 minutes ago, double delay (up to 5 minutes max), otherwise
reset delay to 1 second.

This covers all scenarios: whether we reconnect then immediately
disconnect, or never successfully connect, it doesn't matter.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #5453
2022-07-28 15:08:44 +09:30
Rusty Russell 9498e14530 connectd: two logging cleanups.
Don't log_io final messages twice (multiplex_final_message already does
this, so it's confusing to see us send e.g. WIRE_ERROR twice!).

And report that the peer has failed to connect out *before* telling
lightningd, otherwise we get a very confusing ordering, e.g.:

```
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Reconnecting in 4 seconds
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-lightningd: Will try reconnect in 4 seconds
2022-07-23T05:17:36.096Z DEBUG   027d0de66d08f956a8d606c0d1c34e59bda38c05a3b1cc738fdd6378716c644997-connectd: Failed connected out:
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-25 15:16:58 -07:00
Rusty Russell a3c4908f4a lightningd: don't explicitly tell connectd to disconnect, have it do it on sending error/warning.
Connectd already does this when we *receive* an error or warning, but
now do it on send.  This causes some slight behavior change: we don't
disconnect when we close a channel, for example (our behaviour here
has been inconsistent across versions, depending on the code).

When connectd is told to disconnect, it now does so immediately, and
doesn't wait for subds to drain etc.  That simplifies the manual
disconnect case, which now cleans up as it would from any other
disconnection when connectd says it's disconnected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell c415c80d48 connectd: spelling and typo fixes.
From @niftynei.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 719d1384d1 connectd: give connections a chance to drain when lightningd says to disconnect, or peer disconnects.
We want to avoid lost messages in the common cases.

This generalizes our drain code, by giving the subds each 5 seconds to
close themselves, but continue to allow them to send us traffic (if
peer is still connected) and continue to send them traffic.

We continue to send traffic *out* to the peer (if it's still
connected), until all subds are gone.  We still have a 5 second timer
to close the connection to peer.

On reconnects, we don't do this "drain period" on reconnects: we kill
immediately.

We fix up one test which was looking for the "disconnect" message
explicitly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 6a2817101d connectd: don't move parent while we're being freed.
A subtle case I hadn't come across before: if a child tal_resizes()
its parent while the parent is being deleted, tal gets confused.

The subd destructor does this using tal_arr_remove() on peer->subds,
which is currently being freed:

```
==61056== Invalid read of size 8
==61056==    at 0x185632: del_tree (tal.c:417)
==61056==    by 0x18560D: del_tree (tal.c:412)
==61056==    by 0x185957: tal_free (tal.c:486)
==61056==    by 0x1183BC: peer_discard (connectd.c:1861)
==61056==    by 0x11869E: recv_req (connectd.c:1942)
==61056==    by 0x12774B: handle_read (daemon_conn.c:35)
==61056==    by 0x173453: next_plan (io.c:59)
==61056==    by 0x17405B: do_plan (io.c:407)
==61056==    by 0x17409D: io_ready (io.c:417)
==61056==    by 0x176390: io_loop (poll.c:453)
==61056==    by 0x118A68: main (connectd.c:2082)
==61056==  Address 0x4bd8850 is 16 bytes inside a block of size 48 free'd
==61056==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==61056==    by 0x1860E6: tal_resize_ (tal.c:699)
==61056==    by 0x1373DD: tal_arr_remove_ (utils.c:184)
==61056==    by 0x11D508: destroy_subd (multiplex.c:930)
==61056==    by 0x1850A4: notify (tal.c:240)
==61056==    by 0x1855BB: del_tree (tal.c:402)
==61056==    by 0x18560D: del_tree (tal.c:412)
==61056==    by 0x18560D: del_tree (tal.c:412)
==61056==    by 0x185957: tal_free (tal.c:486)
==61056==    by 0x1183BC: peer_discard (connectd.c:1861)
==61056==    by 0x11869E: recv_req (connectd.c:1942)
==61056==    by 0x12774B: handle_read (daemon_conn.c:35)
```

So simply make the subds children of `peer` not the `peer->subds`
array.  The only effect is that drain_peer() can't simply free the
subds array but must free the subds one at a time.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell d31420211a connectd: add counters to each peer connection.
This allows us to detect when lightningd hasn't seen our latest
disconnect/reconnect; in particular, we would hit the following pattern:

1. lightningd says to connect a subd.
2. connectd disconnects and reconnects.
3. connectd reads message, connects subd.
4. lightningd reads disconnect and reconnect, sends msg to connect to subd again.
5. connectd asserts because subd is alreacy connected.

This way connectd can tell if lightningd is talking about the previous
connection, and ignoere it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 41b379ed89 lightningd: hand fds to connectd, not receive them from connectd.
Before this patch:
1. connectd says it's connected (peer_connected)
2. we tell connectd we want to talk about each channel (peer_make_active)
3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active)
4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active)

Now:
1. connectd says it's connected (peer_connected)
2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd).
3. if connectd says it sent something about a channel we didn't tell it about, we either tell
   it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd).

This is the minimal-size patch, which is why we create socket pairs in
so many places to use the existing functions.  Many cleanups are
possible, since the new flow is so simple.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell ab0e5d30ee connectd: don't io_halfclose()
We don't io_halfclose() the other side, we io_sock_shutdown(), which can leave both
sides unset:

```
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: FATAL SIGNAL 6 (version 57e1af2)
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x563b9b603af7
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: common/daemon.c:46 (crashdump) 0x563b9b603b4b
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 ((null)) 0x7fe6e8d4f08f
lightningd-2: 2022-06-07T11:00:05.053Z **BROKEN** connectd: backtrace: ../sysdeps/unix/sysv/linux/raise.c:51 (__GI_raise) 0x7fe6e8d4f00b
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79 (__GI_abort) 0x7fe6e8d2e858
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92 (__assert_fail_base) 0x7fe6e8d2e728
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101 (__GI___assert_fail) 0x7fe6e8d3ffd5
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:65 (next_plan) 0x563b9b64fd7e
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x563b9b6508f0
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/io.c:423 (io_ready) 0x563b9b650984
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x563b9b652c25
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: connectd/connectd.c:2037 (main) 0x563b9b5f5793
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7fe6e8d30082
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0x563b9b5ebf6d
lightningd-2: 2022-06-07T11:00:05.054Z **BROKEN** connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 40145e619b connectd: remove the redundant "already connected" logic.
It should now be reliable, so we don't need this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 9b6c97437e connectd: remove reconnection logic.
We don't have to put aside a peer which is reconnecting and wait for
lightningd to remove the old peer, we can now simply free the old
and add the new.

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 7b0c11efb4 connectd: don't let peer close take forever.
Sending any pending messages to peer before hanging up is a courtesy:
give it 5 seconds before simply closing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 8678c5efb3 connectd: release peer soon as lightingd tells us.
Now we have separate peer draining logic, we can simply use it when
connectd tells us to release the peer, without waiting.  (We could
simply free the peer, but that's a bit rude, as messages can get
lost).

This removes various complex flags and logic we had before.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.
2022-07-18 20:50:04 -05:00
Rusty Russell e856accb7d connectd: send cleanup messages however peer is freed.
This lets us tal_free() it wherever we want, rather than always
freeing via peer_discard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 9dc3880360 connectd: put peer into "draining" mode when we want to close it.
This removes it from the hashtable, and forces it to do nothing but
send out any remaining packets, then close.

It is, in effect, reduced to a stub, with no further interactions
with the rest of the system (all subds are freed already).

Also removes the need for an explicit "final_msg" too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 37ff013c2c connectd: fix subd tal parents.
This came out in a later patch: freeing the peer->subds doesn't actually
free the subds, because they're reparented onto subd->conn, which is
a child of peer itself.


This breaks because when the peer is finally freed, destroy_subd is
called, and expects to find itself in peer->subds (but we made that
NULL when we manually freed it!).

Fix this, and make it obvious that we tal_steal it.

```
ightning_connectd: FATAL SIGNAL 11 (version v0.11.0.1-25-gbf025aa-modded)
0x55de2a1b8b94 send_backtrace
	common/daemon.c:33
0x55de2a1b8c3e crashdump
	common/daemon.c:46
0x7fe2be2fc08f ???
	/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x55de2a1af41e destroy_subd
	connectd/multiplex.c:1119
0x55de2a217686 notify
	ccan/ccan/tal/tal.c:240
0x55de2a217b9d del_tree
	ccan/ccan/tal/tal.c:402
0x55de2a217bef del_tree
	ccan/ccan/tal/tal.c:412
0x55de2a217bef del_tree
	ccan/ccan/tal/tal.c:412
0x55de2a217f39 tal_free
	ccan/ccan/tal/tal.c:486
0x55de2a1aa116 peer_discard
	connectd/connectd.c:1834
0x55de2a1aa38d recv_req
	connectd/connectd.c:1903
0x55de2a1b9121 handle_read
	common/daemon_conn.c:31
0x55de2a205a35 next_plan
	ccan/ccan/io/io.c:59
0x55de2a20663d do_plan
	ccan/ccan/io/io.c:407
0x55de2a20667f io_ready
	ccan/ccan/io/io.c:417
0x55de2a208972 io_loop
	ccan/ccan/io/poll.c:453
0x55de2a1aa736 main
	connectd/connectd.c:2042
0x7fe2be2dd082 __libc_start_main
	../csu/libc-start.c:308
0x55de2a1a085d ???
	???:0
0xffffffffffffffff ???
	???:0
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-18 20:50:04 -05:00
Rusty Russell 6fd8fa4d95 connectd: optimize requests for "recent" gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-15 21:18:29 +09:30
Rusty Russell 92fe871467 connectd: optimize case where peer doesn't want gossip.
LND and us send 0xFFFFFFFF to turn off gossip.  LDK and Eclair don't
seem to turn off gossip at all, but that's OK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-15 21:18:29 +09:30
Rusty Russell 06e1e119aa pytest: fix test_gossip_no_empty_announcements flake.
This is a side-effect of fixing aging: sometimes, we age our
rcvd_filter cache too fast, and thus re-xmit.  This breaks
our test, since it used dev-disconnect on the channel_announce,
but that closes to l3, not l1!

```
>       assert l1.rpc.listchannels()['channels'] == []
E       AssertionError: assert [{'active': T...ags': 1, ...}] == []
E         Left contains 2 more items, first extra item: {'active': True, 'amount_msat': 100000000msat, 'base_fee_millisatoshi': 1, 'channel_flags': 0, ...}
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #5403
2022-07-12 21:41:19 +09:30
Rusty Russell 7dd8e27862 connectd: don't insist on ping replies when other traffic is flowing.
Got complaints about us hanging up on some nodes because they don't respond
to pings in a timely manner (e.g. ACINQ?), but that turned out to be something
else.

Nonetheless, we've had reports in the past of LND badly prioritizing gossip
traffic, and thus important messages can get queued behind gossip dumps!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: connectd: give busy peers more time to respond to pings.
2022-07-09 12:27:05 +09:30
Rusty Russell 32af92145b update-mocks: handle missing deprecated_apis.
This expands update-mocks to be able to handle (simple!) missing
symbols which are not functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-07-09 09:59:52 +09:30
Alex Myers cbafc0fa33 gossip_store: add flag for spam gossip, update to v10
This will be used to decouple internal use of gossip from what is
passed to gossip peers. Updates GOSSIP_STORE_VERION to 10.

Changelog-Changed: gossip_store updated to version 10.
2022-07-06 14:31:19 +09:30
Rusty Russell 9ab7c8aed3 connected/test: fix memleak in test.
```
VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all connectd/test/run-netaddress > /dev/null
==2483395== 16 bytes in 1 blocks are still reachable in loss record 1 of 15
==2483395==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2483395==    by 0x10D59A: autodata_register_ (autodata.c:20)
==2483395==    by 0x10EB26: register_autotype_type_to_string (type_to_string.h:77)
==2483395==    by 0x10EB6B: register_one_type_to_string0 (type_to_string.c:8)
==2483395==    by 0x188C0C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/connectd/test/run-netaddress)
==2483395==    by 0x4A3A00F: (below main) (libc-start.c:264)
==2483395==
==2483395== 40 bytes in 1 blocks are still reachable in loss record 2 of 15
==2483395==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
...
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-29 21:07:42 +09:30
Rusty Russell fd90e5746b connectd: don't keep around more than one old connection.
This was fixed in 1c495ca5a8 ("connectd:
fix accidental handling of old reconnections.") and then reverted by
the rework in "connectd: avoid use-after-free upon multiple
reconnections by a peer".

The latter made the race much less likely, since we cleaned up the
reconnecting struct once the connection was hung up by the remote
node, but it's still theoretically possible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-28 13:47:27 +09:30
Matt Whitlock 83c825945c connectd: avoid use-after-free upon multiple reconnections by a peer
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255
Fixes: #5282
Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
2022-06-28 13:47:27 +09:30