Commit Graph

11246 Commits

Author SHA1 Message Date
Rusty Russell 3a763bef51 CHANGELOG.md: release notes for 0.11.2.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-20 12:25:55 +09:30
Rusty Russell 743943a130 common: downgrade "internal error" errors from lnd.
Prior to 0.11.0 we had cases where we would treat errors
as warnings: regretfully, this is still needed.  This message
in particular has been widely reported, and it now causes
channel force closes.

Downgrade and log.  I did insert some snarky log message earlier,
but hey, I'm sure CLN has done worse things to our peers!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: treat LND "internal error" as warnings, not force close events (as we did in v0.10).
2022-06-20 12:21:59 +09:30
Rusty Russell c55ba9e9bb common/gossip_store: optimize case where entries are filtered out.
@whitslack complained of large CPU usage by connectd at startup;
I ran perf record on connectd on my machine (which sees a little spike, only)
and I see the cost of reading and discarding the entries:

```
-   95.52%     5.24%  lightning_conne  lightning_connectd  [.] gossip_store_next
   - 90.28% gossip_store_next
      + 40.27% tal_alloc_arr_
      + 22.78% tal_free
      + 11.74% crc32c
      + 9.32% fromwire_peektype
      + 4.10% __libc_pread64 (inlined)
        1.70% be32_to_cpu
```

Much of this is caused by the search for our own gossip: keeping this separately
would be even better, but this fix is minimal.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: connectd: reduce initial CPU load when connecting to peers.
2022-06-20 12:14:25 +09:30
Matt Whitlock 23d786da84 connectd: avoid use-after-free upon multiple reconnections by a peer
`peer_reconnected` was freeing a `struct peer_reconnected` instance
while a pointer to that instance was registered to be passed as an
argument to the `retry_peer_connected` callback function. This caused a
use-after-free crash when `retry_peer_connected` attempted to reparent
the instance to the temporary context.

Instead, never have `peer_reconnected` free a `struct peer_reconnected`
instance, and only ever allow such an instance to be freed after the
`retry_peer_connected` callback has finished with it. To ensure that the
instance is freed even if the connection is closed before the callback
can be invoked, parent the instance to the connection rather than to the
daemon.

Absent the need to free `struct peer_reconnected` instances outside of
the `retry_peer_connected` callback, there is no use for the
`reconnected` hashtable, so remove it as well.

See: https://github.com/ElementsProject/lightning/issues/5282#issuecomment-1141454255
Fixes: #5282
Fixes: #5284
Changelog-Fixed: connectd no longer crashes when peers reconnect.
2022-06-20 11:37:24 +09:30
Rusty Russell e46b0ff886 lightningd: fix crash on rapid reconnect.
Happens occasionally when running
`tests/test_connection.py::test_mutual_reconnect_race` (which is too
flaky to add, without more fixes):


```
lightningd: lightningd/peer_control.c:1252: peer_active: Assertion `!channel->owner' failed.
lightningd: FATAL SIGNAL 6 (version v0.11.0.1-38-g4f167da)
0x5594a41f8f45 send_backtrace
	common/daemon.c:33
0x5594a41f8fef crashdump
	common/daemon.c:46
0x7f7cb585c08f ???
	/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x7f7cb585c00b __GI_raise
	../sysdeps/unix/sysv/linux/raise.c:51
0x7f7cb583b858 __GI_abort
	/build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79
0x7f7cb583b728 __assert_fail_base
	/build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92
0x7f7cb584cfd5 __GI___assert_fail
	/build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101
0x5594a41b45ca peer_active
	lightningd/peer_control.c:1252
0x5594a418794c connectd_msg
	lightningd/connect_control.c:457
0x5594a41cd457 sd_msg_read
	lightningd/subd.c:556
0x5594a41ccbe5 read_fds
	lightningd/subd.c:357
0x5594a4269fc2 next_plan
	ccan/ccan/io/io.c:59
0x5594a426abca do_plan
	ccan/ccan/io/io.c:407
0x5594a426ac0c io_ready
	ccan/ccan/io/io.c:417
0x5594a426ceff io_loop
	ccan/ccan/io/poll.c:453
0x5594a41930d9 io_loop_with_timers
	lightningd/io_loop_with_timers.c:22
0x5594a4199293 main
	lightningd/lightningd.c:1181
0x7f7cb583d082 __libc_start_main
	../csu/libc-start.c:308
0x5594a416e15d ???
	???:0
0xffffffffffffffff ???
	???:0
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-20 11:37:24 +09:30
Rusty Russell 54d19cba5a connectd: don't keep around more than one old connection.
This was fixed in 1c495ca5a8 ("connectd:
fix accidental handling of old reconnections.") and then reverted by
the rework in "connectd: avoid use-after-free upon multiple
reconnections by a peer".

The latter made the race much less likely, since we cleaned up the
reconnecting struct once the connection was hung up by the remote
node, but it's still theoretically possible.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-20 11:37:24 +09:30
Christian Decker d983b328ef topo: Fix issue considering the wrong direction for capacity limits
Changelog-Fixed: topology: Under some circumstances we were considering the limits on the wrong direction for a channel
2022-06-20 11:37:11 +09:30
Christian Decker a14c9f80df route: Do not require both directions to be active
This likely lead to a number of false errors when attempting to
route. We deemed a channel to be unusable as soon as either direction
isn't usable. This is bad since it excludes not only zeroconf
channels (which have different scids for the two directions), but it
also excludes any channel that we haven't seen an update from
yet. This was likely introduced when attemting to exclude nodes that
haven't sent a disable, but their peer has, but this is not necessary
as the unresponsive node would be marked as isolated by all its peers,
so we don't need to artificially mark a channel direction as disabled
when really we can't even enter the node to traverse the channel in
that direction.

Changelog-Fixed: routing: Fixed an issue where we would exclude the entire channel if either direction was disabled, or we hadn't seen an update yet.
2022-06-20 11:36:11 +09:30
Rusty Russell 31513dd8ea connectd: restore gossip filter aging.
When we moved gossip filtering to connectd, this aging got lost.

Without this, we hit the 10,000 entry limit before expiring full
gossip anti-echo cache.  This is under 1M in allocations per peer, but
in DEVELOPER mode each allocation includes adds 3 notifiers (32 bytes
each) and a backtrace child (40 + 40 + 256 bytes), making it almost
10MB per peer, plus allocation overhead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: connectd: large memory usage with many peers fixed.
2022-06-20 11:35:11 +09:30
Rusty Russell b817e0a3ae CHANGELOG.md: release notes for 0.11.1.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-14 14:28:49 +09:30
Rusty Russell b7d18a96ab connectd: remove assert which can trigger.
I have a test which reproduces this, too, and it's been seen in the
wild.  It seems we can add a subd as we're closing, which causes
this assert to trigger.

Fixes: #5254
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-14 14:28:48 +09:30
Rusty Russell 5b96eeddc4 connectd: fix accidental handling of old reconnections.
We had multiple reports of channels being unilaterally closed because
it seemed like the peer was sending old revocation numbers.

Turns out, it was actually old reestablish messages!  When we have a
reconnection, we would put the new connection aside, and tell lightningd
to close the current connection: when it did, we would restart
processing of the initial reconnection.

However, we could end up with *multiple* "reconnecting" connections,
while waiting for an existing connection to close.  Though the
connections were long gone, there could still be messages queued
(particularly the channel_reestablish message, which comes early on).

Eventually, a normal reconnection would cause us to process one of
these reconnecting connections, and channeld would see the (perhaps
very old!) messages, and get confused.

(I have a test which triggers this, but it also hangs the connect
 command, due to other issues we will fix in the next release...)

Fixes: #5240
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-05-14 14:28:47 +09:30
Rusty Russell 0ec947f699 doc: fix getinfo non-ASCII chars for older mrkd.
When building reproducible build for Bionic:

```
Traceback (most recent call last):
  File "/usr/local/bin/mrkd", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/mrkd/__init__.py", line 261, in main
    result = mistune.markdown(fp.read(), inline=inline, renderer=renderer)
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1856: ordinal not in range(128)
doc/Makefile:120: recipe for target 'doc/lightning-getinfo.7' failed
make: *** [doc/lightning-getinfo.7] Error 1
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-25 21:03:22 +09:30
Rusty Russell f5df69020d v0.11.0.1: final fixes for release build
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-25 20:49:44 +09:30
Rusty Russell 27352d0a7b CHANGELOG.md: Looks like rc5 was the ticket!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-25 15:30:26 +09:30
Rusty Russell be2523ec63 CHANGELOG: rc5
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 20:50:23 +09:30
Rusty Russell 98f64fb623 lightningd: don't crash listpeers if we're opening DF channel.
We call out to connectd to activate the peer, and while we do that,
channel->owner is NULL.  A better pattern would be to set up the unsaved
channel once connectd has given us the peer, but this works for now.

Fixes: #5204
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 20:45:32 +09:30
Rusty Russell f6e7d0c5dc pytest: add test for crashing listpeers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 20:45:32 +09:30
joe.miyamoto 562974acdb Fix miner bug in rust client generator.
This commit fixes a bug in a function `gen_enum`, which
is not caught because so far we have no non-required field in enums
defined in json schema.
2022-04-21 16:43:50 +09:30
Rusty Russell 34e93a718e v0.11.0rc4
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 08:38:02 +09:30
Rusty Russell 37e8d2fb0f connectd: disable advertizement of WEBSOCKET addresses.
This seems to prevent broad propagation, due to LND not allowing it.  See
	https://github.com/lightningnetwork/lnd/issues/6432

We still announce it if you disable deprecated-apis, so tests still work,
and hopefully we can enable it in future.

Fixes: #5196
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: disabled websocket announcement due to LND propagation issues
2022-04-21 06:13:55 +09:30
Rusty Russell 393e8e5e6a connectd: remove a noisy debug msg, fix name typo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 06:13:55 +09:30
Rusty Russell c110d0b068 pytest: test that node_announcement regeneration works as expected.
We shorten 24 hours to 24 seconds using --dev--fast-gossip-prune.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 06:13:55 +09:30
Rusty Russell f81a7ff792 gossipd: fix daily node_announcement regeneration.
We have an explicit filter against redundant node_announcement
updates; we only allow 1 a week.  This means that our change to force
a reannouncement every 24 hours did not work!

Allow once a day, instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 06:13:55 +09:30
Rusty Russell a5d027cefc connectd: send our own gossip, even if peer hasn't sent timestamp_filter.
We seem to have made node_announcement propagation *worse*, not
better.  Explorers don't see my nodes updates.

At least some LND nodes never send us timestamp_filter, so we are
never actually stream *any* gossip.  We should send gossip about
ourselves, even if they haven't set a filter (yet).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we more aggressively send our own gossip, to improve propagation chances.
2022-04-21 06:13:55 +09:30
Rusty Russell 9b944dbed4 common/gossip_store: add flag to *only* fetch "push"-marked messages.
These are the ones which are for our own channels (and our own node_announcement).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 06:13:55 +09:30
Rusty Russell ee21ae814a gossipd: revert useless bf8cb640b7.
This attempted to make us re-xmit our own node_announcement at restart,
by moving the node_announcement to the end of the gossip store.  But,
as nothing is connected, yet, this had no effect!

We will rexmit it anyway, since it's marked PUSH.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-21 06:13:55 +09:30
zero fee routing cb70d06d67 corrected typo in description 2022-04-20 21:17:01 +09:30
Rusty Russell f0dc028fa9 lightningd: fix overzealous check in htlc_out_check:
```
+11.668971802 lightningdBROKEN: backtrace: lightningd/log.c:821 (fatal_vfmt) 0x558c893c997f
+11.668978165 lightningdBROKEN: backtrace: lightningd/log.c:829 (fatal) 0x558c893c9a30
+11.668984935 lightningdBROKEN: backtrace: lightningd/htlc_end.c:87 (corrupt) 0x558c893b9b7d
+11.668991262 lightningdBROKEN: backtrace: lightningd/htlc_end.c:205 (htlc_out_check) 0x558c893ba352
+11.669016705 lightningdBROKEN: backtrace: lightningd/peer_htlcs.c:1471 (check_already_failed) 0x558c893ea9c9
+11.669023345 lightningdBROKEN: backtrace: lightningd/peer_htlcs.c:1575 (onchain_failed_our_htlc) 0x558c893eb098
+11.669043122 lightningdBROKEN: backtrace: lightningd/onchain_control.c:438 (handle_onchain_htlc_timeout) 0x558c893cd6ec
+11.669049818 lightningdBROKEN: backtrace: lightningd/onchain_control.c:548 (onchain_msg) 0x558c893cdbdc
+11.669056372 lightningdBROKEN: backtrace: lightningd/subd.c:556 (sd_msg_read) 0x558c893fa532
+11.669063030 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x558c8948e3cd
+11.669069093 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x558c8948ef9e
+11.669075470 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x558c8948efdc
+11.669081900 lightningdBROKEN: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x558c894912a8
+11.669087916 lightningdBROKEN: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x558c893c0966
+11.669094531 lightningdBROKEN: backtrace: lightningd/lightningd.c:1181 (main) 0x558c893c6bf9
```

Fixes: #5191
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-20 16:30:05 +09:30
Rusty Russell 33ae601266 CI: fix bsd workflow.
I have no idea why someone else owns the directory suddenly, but all git
commands fail.  Workaround as suggested by the error message.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-20 07:16:01 +09:30
Rusty Russell c3a7499573 connectd: avoid use-after-free on reconnect with remote_addr.
I was seeing a strange crash:
	Connectd gave bad CONNECT_PEER_CONNECTED message

The message is indeed mangled, around the remote_addr!
A quick review of the code revealed that we were not making a copy
when it was a reconnect, and so the remote_addr pointer was pointing
to memory which was freed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-20 06:44:58 +09:30
Rusty Russell 151d009435 lightningd: remove over-zealous assert.
This was hit on my node.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 10:32:29 +09:30
Rusty Russell a326ea8bb2 CHANGELOG.md: rc3
This time for sure!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 08:02:45 +09:30
Rusty Russell e90b50a1b2 dualopend: handle dev_memleak during RBF.
Happens in CI:

```
lightningd-2: 2022-04-10T15:30:40.788Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-dualopend-chan#1: STATUS_FAIL_MASTER_IO: Error parsing 7507: 1b79
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:20 +09:30
Rusty Russell 3f94b74e07 pytest: don't add extra cln-grpc instance now plugin is builtin
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell 355025239d Make sure DEFAULT_TARGETS includes rust testing examples.
Otherwise tests fail:

```
    def test_plugin_start(node_factory):
        """Start a minimal plugin and ensure it is well-behaved
        """
        bin_path = Path.cwd() / "target" / "debug" / "examples" / "cln-plugin-startup"
>       l1 = node_factory.get_node(options={"plugin": str(bin_path), 'test-option': 31337})
...
error starting plugin '/home/runner/work/lightning/lightning/target/debug/examples/cln-plugin-startup': opening pipe: No such file or directory
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell e5abc10ae2 configure: require rustfmt as well, for rust support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell bad89f186a doc/Makefile: make check-manpages require the cln-grpc plugin.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell c598d4df74 Makefile: make sure ALL_PROGRAMS doesn't include cln-grpc plugin.
$(ALL_PROGRAMS) are assumed to be C programs for now.

Add $(C_PLUGINS) as a new variable.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell a3ec140c03 doc: document the grpc-port option, now cln-grpc is a built-in.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell e4ed15dc33 plugins: fix cln-grpc to work as a builtin when we're run in-tree.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell 436706384c plugins: add grpc-plugin to PLUGINS, so it gets started and installed.
Fixes: #5162
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-13 05:25:00 +09:30
Rusty Russell 874d3532da libwally: update to v0.8.5
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-04-12 15:17:52 +09:30
Michael Dance f067e8c909 Changed external/libwally-core to test_build_fix
Combined with the following commit which is required to
update against changed libsecp256k1 APIs:

Updated deprecated function calls
2022-04-12 15:17:52 +09:30
Christian Decker b359a24772 cln-plugin: Handle --help invocations better
We now have ternary outcomes for `Builder.configure()` and
`Builder.start()`:

 - Ok(Some(p)) means we were configured correctly, and can continue
   with our work normally
 - Ok(None) means that `lightningd` was invoked with `--help`, we
   weren't configured (which is not an error since the `lightningd` just
   implicitly told us to shut down) and user code should clean up and
   exit as well
 - Err(e) something went wrong, user code may report an error and exit.
2022-04-11 15:20:32 +09:30
Stephen Webel c36cef08bc use full path to reference `lightningd.c`
The relative path makes for a difficult experience when people are reading on `https://lightning.readthedocs.io/`. Directly linking saves the reader a few clicks hunting down the correct location :)
2022-04-10 14:38:08 +09:30
Christian Decker 8717c4e5a2 cln-grpc: Add midstate between configuration and replying to init
This is a bit special, in that it allows us to configure the plugin,
but then still abort startup by sending `init` with the `disable` flag
set.
2022-04-10 14:16:35 +09:30
Christian Decker 9826402c99 cln-grpc: Do not exit if grpc-port is not set
Exiting doesn't mesh well with builtin plugins, so just sit there and
do nothing

Changelog-None
2022-04-10 14:16:35 +09:30
Christian Decker 9a8bc777e5 msggen: Add pay descriptions and maxfee 2022-04-08 11:30:10 +09:30
Christian Decker 767da45037 cln-grpc: Do not log an error when called with `--help`
Changelog-None
2022-04-08 11:30:10 +09:30