rgb-cln

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	d31420211a	connectd: add counters to each peer connection. This allows us to detect when lightningd hasn't seen our latest disconnect/reconnect; in particular, we would hit the following pattern: 1. lightningd says to connect a subd. 2. connectd disconnects and reconnects. 3. connectd reads message, connects subd. 4. lightningd reads disconnect and reconnect, sends msg to connect to subd again. 5. connectd asserts because subd is alreacy connected. This way connectd can tell if lightningd is talking about the previous connection, and ignoere it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	41b379ed89	lightningd: hand fds to connectd, not receive them from connectd. Before this patch: 1. connectd says it's connected (peer_connected) 2. we tell connectd we want to talk about each channel (peer_make_active) 3. connectd gives us an fd for each channel, and we connect it to a subd (peer_active) 4. OR, connectd says it sent something about a channel we didn't tell it about, with an fd (peer_active) Now: 1. connectd says it's connected (peer_connected) 2. we start all appropriate subds and tell connectd to what channels/fds (peer_connect_subd). 3. if connectd says it sent something about a channel we didn't tell it about, we either tell it to hang up (peer_final_msg), or connect a new opening daemon (peer_connect_subd). This is the minimal-size patch, which is why we create socket pairs in so many places to use the existing functions. Many cleanups are possible, since the new flow is so simple. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	ab0e5d30ee	connectd: don't io_halfclose() We don't io_halfclose() the other side, we io_sock_shutdown(), which can leave both sides unset: ``` lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: FATAL SIGNAL 6 (version 57e1af2) lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x563b9b603af7 lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: common/daemon.c:46 (crashdump) 0x563b9b603b4b lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 ((null)) 0x7fe6e8d4f08f lightningd-2: 2022-06-07T11:00:05.053Z BROKEN connectd: backtrace: ../sysdeps/unix/sysv/linux/raise.c:51 (__GI_raise) 0x7fe6e8d4f00b lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79 (__GI_abort) 0x7fe6e8d2e858 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92 (__assert_fail_base) 0x7fe6e8d2e728 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101 (__GI___assert_fail) 0x7fe6e8d3ffd5 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:65 (next_plan) 0x563b9b64fd7e lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x563b9b6508f0 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:423 (io_ready) 0x563b9b650984 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x563b9b652c25 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: connectd/connectd.c:2037 (main) 0x563b9b5f5793 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: ../csu/libc-start.c:308 (__libc_start_main) 0x7fe6e8d30082 lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: (null):0 ((null)) 0x563b9b5ebf6d lightningd-2: 2022-06-07T11:00:05.054Z BROKEN connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	40145e619b	connectd: remove the redundant "already connected" logic. It should now be reliable, so we don't need this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	9b6c97437e	connectd: remove reconnection logic. We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: #5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	7b0c11efb4	connectd: don't let peer close take forever. Sending any pending messages to peer before hanging up is a courtesy: give it 5 seconds before simply closing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	8678c5efb3	connectd: release peer soon as lightingd tells us. Now we have separate peer draining logic, we can simply use it when connectd tells us to release the peer, without waiting. (We could simply free the peer, but that's a bit rude, as messages can get lost). This removes various complex flags and logic we had before. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: `connectd`: various crashes and issues fixed by simplification and rewrite.	2022-07-18 20:50:04 -05:00
Rusty Russell	e856accb7d	connectd: send cleanup messages however peer is freed. This lets us tal_free() it wherever we want, rather than always freeing via peer_discard. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	9dc3880360	connectd: put peer into "draining" mode when we want to close it. This removes it from the hashtable, and forces it to do nothing but send out any remaining packets, then close. It is, in effect, reduced to a stub, with no further interactions with the rest of the system (all subds are freed already). Also removes the need for an explicit "final_msg" too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	37ff013c2c	connectd: fix subd tal parents. This came out in a later patch: freeing the peer->subds doesn't actually free the subds, because they're reparented onto subd->conn, which is a child of peer itself. This breaks because when the peer is finally freed, destroy_subd is called, and expects to find itself in peer->subds (but we made that NULL when we manually freed it!). Fix this, and make it obvious that we tal_steal it. ``` ightning_connectd: FATAL SIGNAL 11 (version v0.11.0.1-25-gbf025aa-modded) 0x55de2a1b8b94 send_backtrace common/daemon.c:33 0x55de2a1b8c3e crashdump common/daemon.c:46 0x7fe2be2fc08f ??? /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 0x55de2a1af41e destroy_subd connectd/multiplex.c:1119 0x55de2a217686 notify ccan/ccan/tal/tal.c:240 0x55de2a217b9d del_tree ccan/ccan/tal/tal.c:402 0x55de2a217bef del_tree ccan/ccan/tal/tal.c:412 0x55de2a217bef del_tree ccan/ccan/tal/tal.c:412 0x55de2a217f39 tal_free ccan/ccan/tal/tal.c:486 0x55de2a1aa116 peer_discard connectd/connectd.c:1834 0x55de2a1aa38d recv_req connectd/connectd.c:1903 0x55de2a1b9121 handle_read common/daemon_conn.c:31 0x55de2a205a35 next_plan ccan/ccan/io/io.c:59 0x55de2a20663d do_plan ccan/ccan/io/io.c:407 0x55de2a20667f io_ready ccan/ccan/io/io.c:417 0x55de2a208972 io_loop ccan/ccan/io/poll.c:453 0x55de2a1aa736 main connectd/connectd.c:2042 0x7fe2be2dd082 __libc_start_main ../csu/libc-start.c:308 0x55de2a1a085d ??? ???:0 0xffffffffffffffff ??? ???:0 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-18 20:50:04 -05:00
Rusty Russell	6fd8fa4d95	connectd: optimize requests for "recent" gossip. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-15 21:18:29 +09:30
Rusty Russell	92fe871467	connectd: optimize case where peer doesn't want gossip. LND and us send 0xFFFFFFFF to turn off gossip. LDK and Eclair don't seem to turn off gossip at all, but that's OK. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-15 21:18:29 +09:30
Rusty Russell	06e1e119aa	pytest: fix test_gossip_no_empty_announcements flake. This is a side-effect of fixing aging: sometimes, we age our rcvd_filter cache too fast, and thus re-xmit. This breaks our test, since it used dev-disconnect on the channel_announce, but that closes to l3, not l1! ``` > assert l1.rpc.listchannels()['channels'] == [] E AssertionError: assert [{'active': T...ags': 1, ...}] == [] E Left contains 2 more items, first extra item: {'active': True, 'amount_msat': 100000000msat, 'base_fee_millisatoshi': 1, 'channel_flags': 0, ...} ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: #5403	2022-07-12 21:41:19 +09:30
Rusty Russell	7dd8e27862	connectd: don't insist on ping replies when other traffic is flowing. Got complaints about us hanging up on some nodes because they don't respond to pings in a timely manner (e.g. ACINQ?), but that turned out to be something else. Nonetheless, we've had reports in the past of LND badly prioritizing gossip traffic, and thus important messages can get queued behind gossip dumps! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: connectd: give busy peers more time to respond to pings.	2022-07-09 12:27:05 +09:30
Rusty Russell	32af92145b	update-mocks: handle missing deprecated_apis. This expands update-mocks to be able to handle (simple!) missing symbols which are not functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-07-09 09:59:52 +09:30
Alex Myers	cbafc0fa33	gossip_store: add flag for spam gossip, update to v10 This will be used to decouple internal use of gossip from what is passed to gossip peers. Updates GOSSIP_STORE_VERION to 10. Changelog-Changed: gossip_store updated to version 10.	2022-07-06 14:31:19 +09:30
Rusty Russell	9ab7c8aed3	connected/test: fix memleak in test. ``` VALGRIND=1 valgrind -q --error-exitcode=7 --track-origins=yes --leak-check=full --show-reachable=yes --errors-for-leak-kinds=all connectd/test/run-netaddress > /dev/null ==2483395== 16 bytes in 1 blocks are still reachable in loss record 1 of 15 ==2483395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2483395== by 0x10D59A: autodata_register_ (autodata.c:20) ==2483395== by 0x10EB26: register_autotype_type_to_string (type_to_string.h:77) ==2483395== by 0x10EB6B: register_one_type_to_string0 (type_to_string.c:8) ==2483395== by 0x188C0C: __libc_csu_init (in /home/rusty/devel/cvs/lightning/connectd/test/run-netaddress) ==2483395== by 0x4A3A00F: (below main) (libc-start.c:264) ==2483395== ==2483395== 40 bytes in 1 blocks are still reachable in loss record 2 of 15 ==2483395== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ... ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-29 21:07:42 +09:30
Rusty Russell	fd90e5746b	connectd: don't keep around more than one old connection. This was fixed in `1c495ca5a8` ("connectd: fix accidental handling of old reconnections.") and then reverted by the rework in "connectd: avoid use-after-free upon multiple reconnections by a peer". The latter made the race much less likely, since we cleaned up the reconnecting struct once the connection was hung up by the remote node, but it's still theoretically possible. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-28 13:47:27 +09:30
Matt Whitlock	83c825945c	connectd: avoid use-after-free upon multiple reconnections by a peer `peer_reconnected` was freeing a `struct peer_reconnected` instance while a pointer to that instance was registered to be passed as an argument to the `retry_peer_connected` callback function. This caused a use-after-free crash when `retry_peer_connected` attempted to reparent the instance to the temporary context. Instead, never have `peer_reconnected` free a `struct peer_reconnected` instance, and only ever allow such an instance to be freed after the `retry_peer_connected` callback has finished with it. To ensure that the instance is freed even if the connection is closed before the callback can be invoked, parent the instance to the connection rather than to the daemon. Absent the need to free `struct peer_reconnected` instances outside of the `retry_peer_connected` callback, there is no use for the `reconnected` hashtable, so remove it as well. See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255 Fixes: #5282 Fixes: #5284 Changelog-Fixed: connectd no longer crashes when peers reconnect.	2022-06-28 13:47:27 +09:30
Rusty Russell	4ee55acc71	connectd: don't start connecting in parallel in peer_conn_closed. The crash below from @zerofeerouting left me confused. The invalid value in fmt_wireaddr_internal is a telltale sign of use-after-free. This backtrace shows us destroying the conn twice: what's happening? Well, tal carefully protects against destroying twice: it's not that unusual to free something in a destructor which has already been freed. So this indicates that there are two io_conn hanging off one struct connecting, which isn't supposed to happen! We deliberately call try_connect_one_addr() initially, then inside the io_conn destructor. But due to races in connectd vs lightningd connection state, we added a fix which allows a connect command to sit around while the peer is cleaning up (`6cc9f37cab`) and get fired off when it's done. But what if, in the chaos, we are already connecting again? Now we'll end up with two connections. Fortunately, we have a `conn` pointer inside struct connecting, which (with a bit of additional care) we can ensure is only non-NULL while we're actually trying to connect. This lets us check that before firing off a new connection attempt in peer_conn_closed. ``` lightning_connectd: FATAL SIGNAL 6 (version v0.11.2rc2-2-g8f7e939) 0x5614a4915ae8 send_backtrace common/daemon.c:33 0x5614a4915b72 crashdump common/daemon.c:46 0x7ffa14fcd72f ??? ???:0 0x7ffa14dc87bb ??? ???:0 0x7ffa14db3534 ??? ???:0 0x5614a491fc71 fmt_wireaddr_internal common/wireaddr.c:255 0x5614a491fc7a fmt_wireaddr_internal_ common/wireaddr.c:257 0x5614a491ea6b type_to_string_ common/type_to_string.c:32 0x5614a490beaa destroy_io_conn connectd/connectd.c:754 0x5614a494a2f1 destroy_conn ccan/ccan/io/poll.c:246 0x5614a494a313 destroy_conn_close_fd ccan/ccan/io/poll.c:252 0x5614a4953804 notify ccan/ccan/tal/tal.c:240 0x5614a49538d6 del_tree ccan/ccan/tal/tal.c:402 0x5614a4953928 del_tree ccan/ccan/tal/tal.c:412 0x5614a4953e07 tal_free ccan/ccan/tal/tal.c:486 0x5614a4908b7a try_connect_one_addr connectd/connectd.c:870 0x5614a490bef1 destroy_io_conn connectd/connectd.c:759 0x5614a494a2f1 destroy_conn ccan/ccan/io/poll.c:246 0x5614a494a313 destroy_conn_close_fd ccan/ccan/io/poll.c:252 0x5614a4953804 notify ccan/ccan/tal/tal.c:240 0x5614a49538d6 del_tree ccan/ccan/tal/tal.c:402 0x5614a4953e07 tal_free ccan/ccan/tal/tal.c:486 0x5614a4948f08 io_close ccan/ccan/io/io.c:450 0x5614a4948f59 do_plan ccan/ccan/io/io.c:401 0x5614a4948fe1 io_ready ccan/ccan/io/io.c:417 0x5614a494a8e6 io_loop ccan/ccan/io/poll.c:453 0x5614a490c12f main connectd/connectd.c:2164 0x7ffa14db509a ??? ???:0 0x5614a4904e99 ??? ???:0 0xffffffffffffffff ??? ???:0 ``` Fixes: #5339 Changelog-Fixed: connectd: occasional crash when we reconnect to a peer quickly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-28 13:46:59 +09:30
Vincenzo Palazzo	7ff62b4a00	lightnind: remove`DEFAULT_PORT` global definition Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>	2022-06-28 06:09:01 +09:30
Rusty Russell	a1b8b40d13	connectd: fix debug message on bind fail. It doesn't get the right errno, and it says "create" not "bind". ``` 2022-05-20T03:04:46.498Z DEBUG connectd: Failed to create 2 socket: Success 2022-05-20T03:04:46.500Z DEBUG connectd: REPLY WIRE_CONNECTD_INIT_REPLY with 0 fds 2022-05-20T03:04:46.501Z DEBUG connectd: connectd_init_done 2022-05-20T03:04:46.503Z BROKEN connectd: Failed to bind socket for 127.0.0.1:37871: Address already in use ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-27 17:21:35 +09:30
Michael Schmoock	a2b75b66ba	connectd: use dev_allow_localhost for remote_addr testing Before this fix, there was the situation where a DEVELOPER=1 node would announce non-public addresses on mainnet if detected. Since there are some nodes on the internet that falsely report local addresses we move this 'testing feature' to 'dev-allow-locahost' nodes. Changelog-None	2022-06-17 20:30:16 +09:30
Michael Schmoock	033ac323d1	connectd: prefer IPv6 when available Changelog-Changed: connectd: prefer IPv6 connections when available.	2022-06-17 20:30:16 +09:30
Rusty Russell	0c9017fb76	connectd: shrink max filter size. 10,000 per peer was too much. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-17 14:14:02 +09:30
Rusty Russell	d922abeaba	connectd: optimize gossip_rcvd_filter. Instead of doing an allocation per entry, put the entry in directly. This means only 30 bit resolution on 32-bit machines, but if a bit of gossip gets accidently suppressed that's ok. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-17 14:14:02 +09:30
Rusty Russell	87a471af98	connectd: use is_msg_gossip_broadcast into gossip_rcvd_filter.c It was doing its own equivalent check anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-17 14:14:02 +09:30
Rusty Russell	7c8dc62035	channeld: take over gossip_rcvd_filter.c and is_msg_gossip_broadcast. channeld is the only user of these functions, since it now streams all gossip itself. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-06-17 14:14:02 +09:30
Rusty Russell	ecdfbbf359	connectd: restore gossip filter aging. When we moved gossip filtering to connectd, this aging got lost. Without this, we hit the 10,000 entry limit before expiring full gossip anti-echo cache. This is under 1M in allocations per peer, but in DEVELOPER mode each allocation includes adds 3 notifiers (32 bytes each) and a backtrace child (40 + 40 + 256 bytes), making it almost 10MB per peer, plus allocation overhead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: connectd: large memory usage with many peers fixed.	2022-06-17 14:14:02 +09:30
Jon Griffiths	572942c783	psbt: use DER encoded + sighash byte for PSBT_IN_PARTIAL_SIG items Per BIP-0171, the signature map is of pubkey to "The signature as would be pushed to the stack from a scriptSig or witness". Fixes 5298 Changelog-Fixed: PSBT: Fix signature encoding to comply with BIP-0171. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>	2022-06-09 18:28:35 +02:00
Rusty Russell	abd01a1701	Makefile: update to include fix for remote_addr generation. Now it's formatted properly, we don't need the patch. But we need to explicitly marshal/unmarshal into a byte stream, which involves some code rearrangement. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-05-19 09:47:32 +09:30
Rusty Russell	8b62e2584f	connectd: remove enable-autotor-v2-mode option Changelog-Removed: lightningd: removed `enable-autotor-v2-mode` option (deprecated v0.10.1) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-05-18 10:15:36 +09:30
Rusty Russell	4343f720be	connectd: remove assert which can trigger. I have a test which reproduces this, too, and it's been seen in the wild. It seems we can add a subd as we're closing, which causes this assert to trigger. Fixes: #5254 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-05-16 09:59:42 +09:30
Rusty Russell	1c495ca5a8	connectd: fix accidental handling of old reconnections. We had multiple reports of channels being unilaterally closed because it seemed like the peer was sending old revocation numbers. Turns out, it was actually old reestablish messages! When we have a reconnection, we would put the new connection aside, and tell lightningd to close the current connection: when it did, we would restart processing of the initial reconnection. However, we could end up with multiple "reconnecting" connections, while waiting for an existing connection to close. Though the connections were long gone, there could still be messages queued (particularly the channel_reestablish message, which comes early on). Eventually, a normal reconnection would cause us to process one of these reconnecting connections, and channeld would see the (perhaps very old!) messages, and get confused. (I have a test which triggers this, but it also hangs the connect command, due to other issues we will fix in the next release...) Fixes: #5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-05-16 09:59:42 +09:30
Rusty Russell	37e8d2fb0f	connectd: disable advertizement of WEBSOCKET addresses. This seems to prevent broad propagation, due to LND not allowing it. See https://github.com/lightningnetwork/lnd/issues/6432 We still announce it if you disable deprecated-apis, so tests still work, and hopefully we can enable it in future. Fixes: #5196 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: Protocol: disabled websocket announcement due to LND propagation issues	2022-04-21 06:13:55 +09:30
Rusty Russell	393e8e5e6a	connectd: remove a noisy debug msg, fix name typo. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-04-21 06:13:55 +09:30
Rusty Russell	a5d027cefc	connectd: send our own gossip, even if peer hasn't sent timestamp_filter. We seem to have made node_announcement propagation worse, not better. Explorers don't see my nodes updates. At least some LND nodes never send us timestamp_filter, so we are never actually stream any gossip. We should send gossip about ourselves, even if they haven't set a filter (yet). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: Protocol: we more aggressively send our own gossip, to improve propagation chances.	2022-04-21 06:13:55 +09:30
Rusty Russell	9b944dbed4	common/gossip_store: add flag to only fetch "push"-marked messages. These are the ones which are for our own channels (and our own node_announcement). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-04-21 06:13:55 +09:30
Rusty Russell	c3a7499573	connectd: avoid use-after-free on reconnect with remote_addr. I was seeing a strange crash: Connectd gave bad CONNECT_PEER_CONNECTED message The message is indeed mangled, around the remote_addr! A quick review of the code revealed that we were not making a copy when it was a reconnect, and so the remote_addr pointer was pointing to memory which was freed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-04-20 06:44:58 +09:30
Rusty Russell	836c1b805b	doc: update c-lightning to Core Lightning almost everywhere. Mostly comments and docs: some places are actually paths, which I have avoided changing. We may migrate them slowly, particularly when they're user-visible. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-04-07 06:53:26 +09:30
Rusty Russell	2526e804f7	doc: big BOLT update to incorporate warnings language. We do this (send warnings) in almost all cases anyway, so mainly this is a textual update, but there are some changes: 1. Send ERROR not WARNING if they send a malformed commitment secret. 2. Send WARNING not ERROR if they get the shutdown_scriptpubkey wrong (vs upfront) 3. Send WARNING not ERROR if they send a bad shutdown_scriptpubkey (e.g. p2pkh in future) 4. Rename some vars 'err' to 'warn' to make it clear we send a warning. This means test_option_upfront_shutdown_script can be made reliable, too, and it now warns and doesn't automatically close channel. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-04-02 09:40:18 +10:30
Rusty Russell	9bddfc2048	connectd: take dev-suppress-gossip from gossipd. Gossipd didn't actually suppress all gossip, resulting in a flake! Doing it in connectd now makes much more sense. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-31 19:38:05 +10:30
Rusty Russell	ea7120a313	lightningd: add --dev-no-ping-timer to avoid ping response timeouts. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-31 13:40:27 +10:30
Rusty Russell	20392ae526	connectd: restore obs2 onion support. I removed these prematurely: we haven't had a release since introducing them! This consists of reverting `d15d629b8b` "plugins/fetchinvoice: remove obsolete string-based API." and plugins/fetchinvoice: remove obsolete string-based API. "onion_messages: remove obs2 support." Some minor changes due to updated fromwire_tlv API since they were removed, but not much. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: REVERT: Removed backwards compat with onion messages from v0.10.1.	2022-03-29 10:55:12 +10:30
Rusty Russell	a770f51d0e	tools/generate_wire.py: make functions allocate the TLV. Requiring the caller to allocate them is ugly, and differs from other types. This means we need a context arg if we don't have one already. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-25 13:55:44 +10:30
Rusty Russell	fa0c29f959	tools/generate_wire.py: tlvs should start with tlv_ No more "towire_offer", but "towire_tlv_offer". This means we double-up on the unfortunately-named `tlv_payload` inside the onion, but we should rename that in the spec when we remove old payloads. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-25 13:55:44 +10:30
Rusty Russell	7829f2eb06	onion_messages: remove obs2 support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-EXPERIMENTAL: Removed backwards compat with onion messages from v0.10.1.	2022-03-25 13:55:44 +10:30
Rusty Russell	32cd7ae398	connectd: key multiple subds by channel_id, use for lookup. We still don't have multiple subds per peer, but now we could! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	395051cdf8	connectd: track the channel_id of each stream to/from peer. This means doing some wire interpretation, and handling the transient case where we switch from temporary to permenant channel_id, but it's not that bad (and required for accurate demux when multiple channels are involved for a single peer). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	fe9f391a93	connectd: tell lightningd the channel_id when we give it the active peer. Now we always have it (either extracted from an unsolicited message, or told to us by lightningd when it tells us it wants to talk), we can always send it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	2bc58e2327	lightningd: always tell connectd the channel id. This means lightningd needs to create the temporary one and tell it to openingd/dualopend, rather than the other way around. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	2424b7dea8	connectd: hold peer until we're interested. Either because lightningd tells us it wants to talk, or because the peer says something about a channel. We also introduce a behavior change: we disconnect after a failed open. We might want to modify this later, but we it's a side-effect of openingd not holding onto idle connections. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	deecedb033	connectd: tell lightningd when disconnect is complete. This avoids races in our tests where we assume it's sync (and is kind of nicer). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	6cc9f37cab	connectd: handle connect vs closing race better. We would return success from connect even though the peer was closing; this is technically correct but fairly undesirable. Better is to pass every connect attempt to connectd, and have it block if the peer is exiting (and retry), otherwise tell us it's already connected. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	16e9ba0361	connectd: fix confusing names. The message from lightningd simply acknowleges that we are allowed to discard the peer (because no subdaemons are talking to it anymore). This difference becomes more stark once connectd holds on to idle peers. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	fcd0b2eb42	connectd: prepare for multiple subd connections. We still always have 1, but the infrastructure is now in place. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	005d69c463	connectd: clean up decrypted packet memory handling. Use tmpctx, rather than freeing manually everywhere (proof: next patch added a branch and forgot to free it!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	9bbb32433e	connectd: make sure we do IO logging on final_msg output. This happens when we send a warning or lightningd tells us to send a final message then close. Normally io logging is done by the subdaemon that creates it, but this is a special case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-23 13:20:12 +10:30
Rusty Russell	953f238bd2	connectd: use closefrom for faster forking, and ignore children Zombie sighting fom jb55. Fixes: #5092 Changelog-EXPERIMENTAL: Fixed `experimental-websocket-port` not to leave zombie processes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-21 21:07:26 +10:30
William Casarin	f72a08c802	websocketd: fix random failures by blocking stdin reads Example request that is dying: NEW REQUEST! lightning_websocketd:main [1955685] <-- bad request from safari read 507 write_all 1 -> websocket_to_lightningd -> read_payload_header read 2 read_all 1 read -11 <--- This tried to read a part of the header, is this -EAGAIN? read_all 0 should we be blocking on these reads? dies Fixes #5089 Changelog-Fixed: `experimental-websocket` intermittent read errors fixed Signed-off-by: William Casarin <jb55@jb55.com>	2022-03-14 12:26:46 -05:00
Michael Schmoock	ef84d6eec5	chore: remove EXPERIMENTAL for rfc #917 remote_addr	2022-03-11 16:42:45 +10:30
Michael Schmoock	57fb34ed06	test: connectd netaddress Increases test coverage by adding a testcase for connectd/netaddress.c Changelog-None	2022-03-11 16:42:45 +10:30
Michael Schmoock	b930b8c548	wireaddr: adds wireaddr_eq_without_port and wireaddr_cmp_type Adds wireaddr_eq_without_port so it can be used later. Moves wireaddr_cmp_type from connectd.c to this file, so it can be reused later.	2022-03-11 16:42:45 +10:30
Michael Schmoock	f1981461ef	connectd: ignore private remote_addr on non-DEVELOPER builds When compiled without DEVELOPER this will now filter out `remote_addr` that come from localhost. The testcase checks for DEVELOPER to test for correct function of `remote_addr`. Also, I renamed "test_connect" to "test_connect_basic" so it can be started without all the other tests in that file that start with "test_connect..."	2022-03-11 16:42:45 +10:30
Michael Schmoock	e92176248e	chore: fix typo announcable -> announceable "announcable" is a common misspelling of "announceable", see: https://en.wiktionary.org/wiki/announcable	2022-03-11 16:42:45 +10:30
Rusty Russell	b5a1715c2b	connectd: also fail without a scary backtrace when listen fails. For example, if you do: ``` ./lightningd/lightningd --network=regtest --experimental-websocket-port=19846 ``` Then you're trying to reuse the normal port as the websocket port, but this only fails at listen time, when we activate connectd. Catch this too. Fixes incorrect fatal() message, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-05 15:48:03 +10:30
Rusty Russell	885a6f50ae	connectd: make sure we announce websocket addr which succeeded. By accessing `addr` after the loop, it's possible that it's one which failed, in complex scenarios. Also gives us a chance to warn if they specify a websocket but don't actually end up advertizing it (you must advertize a normal addr as well). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-05 15:48:03 +10:30
Rusty Russell	c075d78431	connectd: use listen_fd array directly, rather than returning binding arr. We always added to both arrays, might as well just keep one. We make mayfail an explicit flag, rather than relying on the presence of errstr, which is never NULL now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-05 15:48:03 +10:30
Rusty Russell	a62f5e5d82	connectd: hoist find_local_address so we can give more graceful Tor erros. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-05 15:48:03 +10:30
Rusty Russell	200a8a985b	connectd: add is_websocket and wireaddr to struct listen_fd. This lets us give a better error message if listen fails, and also moved the callback closer to where it's needed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-05 15:48:03 +10:30
Rusty Russell	f1ed373c97	connectd: be more graceful when we an address is in use. Aditya had this issue due to a config line, and the result was hard to diagnose even for me. It's now: ``` $ ./lightningd/lightningd --network=regtest --addr=:18444 2022-02-26T05:01:28.705Z BROKEN connectd: Failed to bind socket for 0.0.0.0:18444: Address already in use ``` Whereas before it doesn't even give the address it's trying to bind: ``` rusty@rusty-XPS-13-9370:~/devel/cvs/lightning (master)$ ./lightningd/lightningd --network=regtest --addr=:18444 lightning_connectd: Failed to bind on 2 socket: Address already in use (version v0.10.2-331-g86b83e4) 0x558a8b8d9a12 send_backtrace common/daemon.c:33 0x558a8b8e91e1 status_failed common/status.c:221 0x558a8b8c8e4f make_listen_fd connectd/connectd.c:1090 0x558a8b8c8f55 handle_wireaddr_listen connectd/connectd.c:1129 0x558a8b8c993d setup_listeners connectd/connectd.c:1312 0x558a8b8ca344 connect_init connectd/connectd.c:1517 0x558a8b8cbb57 recv_req connectd/connectd.c:1896 0x558a8b8d9f9f handle_read common/daemon_conn.c:31 0x558a8b9247c1 next_plan ccan/ccan/io/io.c:59 0x558a8b9253c9 do_plan ccan/ccan/io/io.c:407 0x558a8b92540b io_ready ccan/ccan/io/io.c:417 0x558a8b9276fe io_loop ccan/ccan/io/poll.c:453 0x558a8b8cbf36 main connectd/connectd.c:2033 0x7fe4d02940b2 ??? ???:0 0x558a8b8c285d ??? ???:0 0xffffffffffffffff ??? ???:0 2022-02-26T05:02:27.547Z BROKEN connectd: Failed to bind on 2 socket: Address already in use (version v0.10.2-331-g86b83e4) 2022-02-26T05:02:27.547Z BROKEN connectd: backtrace: common/daemon.c:38 (send_backtrace) 0x558a8b8d9a68 2022-02-26T05:02:27.547Z BROKEN connectd: backtrace: common/status.c:221 (status_failed) 0x558a8b8e91e1 2022-02-26T05:02:27.547Z BROKEN connectd: backtrace: connectd/connectd.c:1090 (make_listen_fd) 0x558a8b8c8e4f 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: connectd/connectd.c:1129 (handle_wireaddr_listen) 0x558a8b8c8f55 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: connectd/connectd.c:1312 (setup_listeners) 0x558a8b8c993d 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: connectd/connectd.c:1517 (connect_init) 0x558a8b8ca344 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: connectd/connectd.c:1896 (recv_req) 0x558a8b8cbb57 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: common/daemon_conn.c:31 (handle_read) 0x558a8b8d9f9f 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x558a8b9247c1 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x558a8b9253c9 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x558a8b92540b 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x558a8b9276fe 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: connectd/connectd.c:2033 (main) 0x558a8b8cbf36 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: (null):0 ((null)) 0x7fe4d02940b2 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: (null):0 ((null)) 0x558a8b8c285d 2022-02-26T05:02:27.548Z BROKEN connectd: backtrace: (null):0 ((null)) 0xffffffffffffffff 2022-02-26T05:02:27.548Z BROKEN connectd: STATUS_FAIL_INTERNAL_ERROR: Failed to bind on 2 socket: Address already in use lightningd: connectd failed (exit status 242), exiting. ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-03-05 15:48:03 +10:30
Michael Schmoock	df9a34b81e	chore: use EXPERIMENTAL for BOLT1 remote_addr #917	2022-02-22 05:45:47 +10:30
Michael Schmoock	38e2abf68a	peer_exchange: set, read and log remote_addr Changelog-Added: Protocol: set remote_addr on init tlvs	2022-02-22 05:45:47 +10:30
Rusty Russell	d4fee837c2	misc: clarifications from cdecker review. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	727b486d49	connectd: don't received useless peer fd if we're told to send final msg. We don't need the connection to ourselves, just to free it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	ca08f27d54	connectd: remove second gossip fd. Now we only send and receive gossip messages on this fd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	3121cebf4c	gossipd: don't hand out fds. Gossipd now simply gets told by channeld when peers arrive or leave. (it only needs to know for the seeker). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	3c5d27e3e9	subdaemons: remove gossipd fd from per-peer daemons. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	1c71c9849b	connectd: handle custom messages. This is neater than what we had before, and slightly more general. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: JSON_RPC: `sendcustommsg` now works with any connected peer, even when shutting down a channel.	2022-02-08 11:15:52 +10:30
Rusty Russell	960e911986	connectd: do io logging properly for msgs we make. We don't need to log msgs from subds, but we do our own, and we weren't. 1. Rename queue_peer_msg to inject_peer_msg for clarity, make it do logging 2. In the one place where we're relaying, call msg_queue() directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	8782d39476	connectd: handle onion messages. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	50eccb6a12	connectd: handle pings and pongs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: JSON_RPC: `ping` now works with connected peers, even without a channel.	2022-02-08 11:15:52 +10:30
Rusty Russell	d7cf38a80a	connectd: divert gossip messages directly to gossipd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	9983c2fd8e	gossipd: add routines to send gossip messages to and from connectd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	bba468a51c	connectd: temporarily have two fds to gossipd. We want to stream gossip through this, but currently connectd treats the fd as synchronous. While we work on getting rid of that, it's easiest to have two fds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-02-08 11:15:52 +10:30
Rusty Russell	c98734e0a4	connectd: don't ignore requests to connect if we're shutting down. We used to shut down peers atomically, but now we flush the connections there's a delay. If we are asked to connect in that time, we ignore it, as we are already connected, but that's wrong: we need to remember that we were told to connect and reconnect. This should solve a few weird test failures where "connect" would hang indefinitely. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	4584066a1e	connectd: make sure we io_log msgs doing to gossipd. test_gossip_no_empty_announcements relies on this! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	1ae3172409	connectd: flush queues before hanging up. This is critical in the common case where peer sends an error and hangs up: we almost never get to relay the error to the subd in time. This also applies in the other direction: we need to flush the queue to the peer when the subd closes. Note we only free the actual peer struct when lightningd reaps us with connectd_peer_disconnected(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	0841e4190b	connectd: also do the shutdown()-close for final_msg sends. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	d29795a198	connectd: don't just close to peer, but use shutdown(). We would lose packets sometimes due to this previously, but it doesn't happen over localhost so our tests didn't notice. However, now we have connectd being sole thing talking to peers, we can do a more elegant shutdown, which should fix closing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Fixed: Protocol: Always flush sockets to increase chance that final message get to peer (esp. error packets).	2022-01-20 15:24:06 +10:30
Rusty Russell	d51fb5207a	msg_queue: don't allow magic MSG_PASS_FD message for peers. msg_queue was originally designed for inter-daemon comms, and so it has a special mechanism to mark that we're trying to send an fd. Unfortunately, a peer could also send such a message, confusing us! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	a93c49ca65	connectd: implement @ correctly. dev_blackhole_fd was a hack, and doesn't work well now we are async (it worked for sync comms in per-peer daemons, but now we could sneak through a read before we get to the next write). So, make explicit flags and use them. This is much easier now we have all peer comms in one place. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	bb5beeddd7	connectd: drop support (unused) for @ during handshake. We could implement it, but we don't have to. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	26b9384fd0	various: minor cleanups from Christian's review. More significant things have been folded. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	39c93ee6e5	connectd: get addresses from lightningd, not gossipd. It's weird to have connectd ask gossipd, when lightningd can just do it and hand all the addresses together. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	6d4c56e8b6	connectd: put more stuff into struct gossip_state. We're the only ones who use it now, so put our fields inside it and make it local. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	407a89a400	connectd: remove per_peer_state in favor of keeping gossip_fd directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	6115ed02e8	subdaemons: don't stream gossip_store at all. We now let gossipd do it. This also means there's nothing left in 'struct per_peer_state' to send across the wire (the fds are sent separately), so that gets removed from wire messages too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	029d65cf2e	connectd: serve gossip_store file for the peer. We actually intercept the gossip_timestamp_filter, so the gossip_store mechanism inside the per-peer daemon never kicks off for normal connections. The gossipwith tool doesn't set OPT_GOSSIP_QUERIES, so it gets both, but that only effects one place. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30
Rusty Russell	e37a638c0c	connectd: do nagle by packet type. channeld can't do it any more: it's using local sockets. Connectd can do it, and simply does it by type. Amazingly, on my machine the timing change always caused test_channel_receivable() to fail, due to a latent race. Includes feedback from @cdecker. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2022-01-20 15:24:06 +10:30

1 2 3 4 5 ...

396 Commits