dump_our_gossip() is mainly useful for propagating our gossip when we
are poorly connected, not when we have many peers. @whitslack
reported excessive memory use queueing messages on a large node, so we
limit it beyond the first 5 peers, to 5 channels each.
This assumes we have ~ the same number of peers as channels, which
is probably reasonable.
In the long term, we should move this to connectd, which is properly
equipped to trickle out these messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #6540
Update the lightningd <-> channeld interface with lots of new commands to needed to facilitate spicing.
Implement the channeld splicing protocol leveraging the interactivetx protocol.
Implement lightningd’s channel_control to support channeld in its splicing efforts.
Changelog-Added: Added the features to enable splicing & resizing of active channels.
Update gossip routiens and various other hecks on the channel state to consider AWAITING_SPLICE to be routable and treated similar to CHANNELD_NORMAL.
Small updates to psbt interface
Changelog-None
Also as update_own_node_announcement is called nearly continuously
under normal operation by maybe_send_own_node_announce, the timer should
not be freed continuously - better to only free before actually
refreshing.
When an outdated own node announcement is present, it fails the
nannounce_different test and also fails to kick off the forced regen
timer.
Changelog-Fixed: Node announcements are refreshed more reliably.
This will at least *help* the case where these were not populated, causing us
to send errors without channel_updated appended.
It's not perfect: we can still send such errors if the gossip store is
corrupted, and we still send them for private channels, but it should
help.
(The much better fix is far more invasive, so slips to next release!)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This looked like a test flake, but was real:
```
l1.daemon.wait_for_log("closing soon due to the funding outpoint being spent")
# We won't gossip the dead channel any more (but we still propagate node_announcement). But connectd is not explicitly synced, so wait for "a bit".
time.sleep(1)
> assert len(get_gossip(l1)) == 2
E assert 4 == 2
```
We can see that two channel_updates come in *after* we mark it dying:
```
gossipd: channel 103x1x0 closing soon due to the funding outpoint being spent
gossipd: REPLY WIRE_GOSSIPD_NEW_BLOCKHEIGHT_REPLY with 0 fds
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Received channel_update for channel 103x1x0/0 now DISABLED
022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-gossipd: Received channel_update for channel 103x1x0/1 now DISABLED
```
We should keep marking channel_updates the same way.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
While one side was not produced by us, we have a vested interest in propagating it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: When we send our own gossip when a peer connects, also send any incoming channel_updates.
@endothermicdev and I found this while investigating a "nobody sees my node_announcement" bug report.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: #6410
Reported-by: benjaminchodroff on discord
Changelog-Fixed: Protocol: When we send our own gossip when a peer connects, send our node_announcement too (regression in v23.05)
Alex and I were reading it and I got confused: it's really a simpler loop
than it seems, with all those redundant `continue` statements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't actually delete them for 12 blocks, but we can't avoid
propagating them. We don't mark node_announcements, which is a bit
weird, but avoids us tracking logic to un-dying them if a channel is
opened.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We keep several peer pointers, but we just add a hook to NULL them
manually when a peer dies, rather than using voodoo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use a "softref" which is a magic pointer which gets NULL'ed when
the object is freed. But it's heavy, and a bit tricky to use, and we
only use it in gossipd.
Instead, keep the nodeid, and do a lookup (now that's fast) if we want
to credit the sender for valid gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported in #6270, there was an attempt to delete gossip overrunning
the end of the gossip_store. This logs the gossip type that was attempted to be deleted and avoids an immediate crash (tombstones would be fine to
skip over at least.)
Changelog-None
The push bit was convenient for connectd to send our own gossip
to peers upon connecting by naively traversing the gossip_store
and sending anything flagged `push`. This function is now
performed by gossipd leaving no use for the push bit.
Changelog-Changed: `gossipd`: gossip_store PUSH bit is no longer set.
This was previously the role of connectd, but it's actually more
efficient for us to do it: connectd has to sweep through the entire
gossip_store, but we have datastructures for this already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This reverts us to the v22.11 behaviour, pending a revisit for the
next release.
Changelog-Changed: gossipd: revert zombification change, keep all gossip for now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Loading the gossip_store would not create a pending node announcement
when the node already had a zombie channel. This would cause the node
announcement to attempt to be loaded, but fail because it had no
broadcastable channels. Accepting a pending node announcement as when
normally loading from the channel corrects this.
`node_has_public_channels` taking into account zombie channels enables
this behavior.
Separately, node_announcements were still being flagged as zombies
in the gossip store despite that feature being removed.
Changelog-None
Without inheriting zombie status, gossipd would allow regular channel updates
into the store until the pruning cycle hits (and the channel is properly
flagged) which is 3.5 days. Applying zombie status when reading channel
updates from the store prevents this.
Changelog-None
remove_chan_from_node already corrects the ordering if a node_announcement
is left ahead of the next oldest channel_announcement, but zombifying should
do that check (and reorder if necessary) too.
Changelog-None
Closing channels would previously require moving the node announcements
in the gossip store on occasion. They incorrectly lost their spam flag
during this process (would no longer be squelched.)
Changelog-None
A zombie channel is not considered broadcastable, so if all channels
are zombies (i.e. is_node_zombie() is true), then
node_has_broadcastable_channels() is false.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies things (we'll get node_announcement if they ever
rebroadcast), since we clearly have an issue with node_announcement for
zombie nodes.
Changes:
1. Remove now-unused gossip_store_mark_nannounce_zombie and resurrect_nannouncements.
2. Don't consider zombie channels to count when deciding whether to move node_announcement
(node_announcement must be preceded by at least one broadcastable channel_announcement).
3. Treat incoming node_announcement where we have all-zombie channels the same as if
we had no channels.
4. Remove node_announcement whenever we have no announcable channels (not just zombies).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This could always happen if we armed the timer when we did have public
channels, and by the time we did our node_announcement we no longer
did, but it gets triggered in our tests when we remove (our own!)
zombied node_announcement in the next patch.
It's actually two separate u16 fields, so actually treat it as
such!
Cleans up zombie handling code a bit too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Though BOLT 7 says a channel may be pruned when one side becomes inactive
and fails to refresh their channel_update, in practice, the
channel_announcement can be difficult to recover if deleted entirely.
Here the channel_announcement is tagged as zombie such that gossip_store
consumers may safely ignore it, but it may be retained should the channel
come back online in the future. Node_announcements and channel_updates may
also be retained in such a fashion until the channel is ready to be
resurrected.
Changelog-Fixed: Pruned channels are more reliably restored.
This adds the option to explicitly enable ip-discovery, which maybe
helpful for example when a user wants TOR announced along with
discovered IPs to improve connectivity and have TOR just as a fallback.
Changelog-Added: Adds config switch 'announce-addr-discovered': on/off/auto
We removed a warning about the channel_update being malformed since
the warning could cause lnd to disconnect (seems they treat
channel-unrelated warnings as fatal?). This was caused by lnd not
enforcing the `htlc_maximum`, thus the parsing would fail. We can
re-add the warning once our assumption that `htlc_maximum` being set
is valid.
Changelog-Fixed: gossip: We no longer send warning that lnd would not understand if we get outdated gossip
Private channel updates can no longer be flagged as spam during handling of
a new channel update (this was a bug.) Also slightly reworked previous
channel_update deletion for clarity.
When private channel updates exceed the gossip ratelimit, the previous
gossip store entry was not deleted even though all private channel updates
are stored. This caused gossip store corruption due to duplicate entries
for the same channel.
Fixes: #5656
Changelog-Fixed: Fixed gossip_store corruption from duplicate private channel updates
This adds a new "chan_dying" message to the gossip_store, but since we
already changed the minor version in this PR, we don't bump it again.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: We now delay forgetting funding-spent channels for 12 blocks (as per latest BOLTs, to support splicing in future).
It's a bit more optimal, and tells gossipd exactly what height the
spend occurred at (with multiple blocks, it's not always the current
height!). It will need that next patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: We now set the `dont_forward` bit on private channel_update's message_flags (as per latest BOLTs).
We will now simply reject old-style ones as invalid. Turns out the
only trace we could find is a channel between two nodes unconnected to
the rest of the network.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: We now require all channel_update messages include htlc_maximum_msat (as per latest BOLTs)
Many changes to gossmap (including the pending ones!) don't actually
concern readers, as long as they obey certain rules:
1. Ignore unknown messages.
2. Treat all 16 upper bits of length as flags, ignore unknown ones.
So now we split the version byte into MAJOR and MINOR, and you can
ignore MINOR changes.
We don't expose the internal version (for creating the map)
programmatically: you should really hardcode what major version you
understand!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If they really upgrade directly from 0.9.2, it will simply delete the
store and re-fetch it.
We still update from v9 (which could be v0.11), since it's a noop.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add memleak_ignore_children() so callers can do exclusions themselves.
Having two exclusions was always such a hack!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is cleaner because, the `remote_addr` and `discovered_ip` are
related but two different things.
Within connectd and lightningd we use the peers `remote_addr` feature
to validate (and guess a port) to be used for IP discovery.
Also when a peer reports us a `remote_addr`, this is given to the plugin API
via the `peer_connected` hook. The network port here is not modified for
godd reason! This can be used i.e. to detect if we are behind a NAT.
But once lightningd figures enough peers report the same `remote_addr`,
it sets the port to the selected network and tells gossipd to use that for
`node_announcement` updates.
Hence, within gossipd, there is no (should not be) `remote_addr`.
Changelog-None
This contains the zeroconf stuff, with funding_locked renamed to
channel_ready. I change that everywhere, and try to fix up the
comments.
Also the `alias` field is called `short_channel_id`.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: `funding_locked` is now called `channel_ready` as per latest BOLTs.
We have them split over common/param.c, common/json.c,
common/json_helpers.c, common/json_tok.c and common/json_stream.c.
Change that to:
* common/json_parse (all the json_to_xxx routines)
* common/json_parse_simple (simplest the json parsing routines, for cli too)
* common/json_stream (all the json_add_xxx routines)
* common/json_param (all the param and param_xxx routines)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will only add the discovered `remote_addr` IPs if no other
addresses would be announced. Meaning whenever a public address was
found by autobind or an address was specified via commandline or config,
IP discovery will be disabled.
Addresses: #5305
Note from the author: We could/should also enable IP discovery when we only
have a TOR address (but without --always-use-proxy ofc). This will give
nodes an option to have a bootstrap way to be reached until IP discovery
can do the job in a more stable way.
Changelog-Changed: Only use IP discovery as fallback when no addresses would be announced
This commit got reduced to just changing a comment because
the stuff it initially did was already merged in before by
commit 7ff62b4a
So I just kept the changed comment as its more precise.
Changelog-None
routing.c fixed to properly remove rate-limited gossip_store entries
when channels are closed. This caused gossipd to crash on a subsequent
gossip_store_load. Also corrects an overzealous limit of one gossip_store
entry per message (should now allow one broadcastable and one
rate-limited). Addresses issues 5387, 5395.
Changelog-None
This grows the routing state in order to index both okay-to-broadcast
and rate-limited gossip. The gossip_store also logs the rate-limited
gossip if useful. This allows the broadcast of the last non-rate-limited
gossip.
routing.c now flags rate-limited gossip as it enters the gossip_store but
makes use of it in updating the routing graph. Flagged gossip is not
rebroadcast to gossip peers.
Changelog-Changed: gossipd: now accepts spam gossip, but squelches it for
peers.
This will be used to decouple internal use of gossip from what is
passed to gossip peers. Updates GOSSIP_STORE_VERION to 10.
Changelog-Changed: gossip_store updated to version 10.
@whitslack complained of large CPU usage by connectd at startup;
I ran perf record on connectd on my machine (which sees a little spike, only)
and I see the cost of reading and discarding the entries:
```
- 95.52% 5.24% lightning_conne lightning_connectd [.] gossip_store_next
- 90.28% gossip_store_next
+ 40.27% tal_alloc_arr_
+ 22.78% tal_free
+ 11.74% crc32c
+ 9.32% fromwire_peektype
+ 4.10% __libc_pread64 (inlined)
1.70% be32_to_cpu
```
Much of this is caused by the search for our own gossip: keeping this separately
would be even better, but this fix is minimal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: connectd: reduce initial CPU load when connecting to peers.
We had json_add_amount_msat_only(), which was designed to be used to
print out msat fields, if we had sats.
However, we misused it, so split it into the three different cases:
1. json_add_amount_sat_msat: We are using it correctly, with a field called
xxx_msat.
2. json_add_amount_sats_deprecated: We were using it wrong, so deprecate
the old field and create a new one which does end in _msat.
3. json_add_sats: we were using it to hand sats as a JSON parameter to an
interface, where "XXXsat".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Deprecated: Plugins: `rbf_channel` and `openchannel2` hooks `their_funding` (use `their_funding_msat`)
Changelog-Deprecated: Plugins: `openchannel2` hook `dust_limit_satoshis` (use `dust_limit_msat`)
Changelog-Deprecated: Plugins: `openchannel` hook `funding_satoshis` (use `funding_msat`)
Changelog-Deprecated: Plugins: `openchannel` hook `dust_limit_satoshis` (use `dust_limit_msat`)
Changelog-Deprecated: Plugins: `openchannel` hook `channel_reserve_satoshis` (use `channel_reserve_msat`)
Changelog-Deprecated: Plugins: `channel_opened` notification `amount` (use `funding_msat`)
Changelog-Deprecated: JSON-RPC: `listtransactions` `msat` (use `amount_msat`)
Changelog-Deprecated: Plugins: `htlc_accepted` `forward_amount` (use `forward_msat`)
This was eliminated this morning in the latest spec. We still accept them,
we just don't produce them any more.
Changelog-Removed: Protocol: We no longer create gossip messages which use zlib encoding (we still understand them, for now!)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have an explicit filter against redundant node_announcement
updates; we only allow 1 a week. This means that our change to force
a reannouncement every 24 hours did not work!
Allow once a day, instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This attempted to make us re-xmit our own node_announcement at restart,
by moving the node_announcement to the end of the gossip store. But,
as nothing is connected, yet, this had no effect!
We will rexmit it anyway, since it's marked PUSH.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do this (send warnings) in almost all cases anyway, so mainly this
is a textual update, but there are some changes:
1. Send ERROR not WARNING if they send a malformed commitment secret.
2. Send WARNING not ERROR if they get the shutdown_scriptpubkey wrong (vs upfront)
3. Send WARNING not ERROR if they send a bad shutdown_scriptpubkey (e.g. p2pkh in future)
4. Rename some vars 'err' to 'warn' to make it clear we send a warning.
This means test_option_upfront_shutdown_script can be made reliable, too,
and it now warns and doesn't automatically close channel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Gossipd didn't actually suppress all gossip, resulting in a flake!
Doing it in connectd now makes much more sense.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Requiring the caller to allocate them is ugly, and differs from
other types.
This means we need a context arg if we don't have one already.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Simply exit, like we do when master daemon_conn exits.
```
Valgrind error file: valgrind-errors.2211908
==2211908== Invalid read of size 8
==2211908== at 0x12AC13: daemon_conn_send (daemon_conn.c:137)
==2211908== by 0x113CD9: queue_peer_msg (gossipd.c:118)
==2211908== by 0x11B806: query_channel_range (queries.c:1169)
==2211908== by 0x1250DD: peer_gossip_probe_scids (seeker.c:706)
==2211908== by 0x1253B1: check_firstpeer (seeker.c:788)
==2211908== by 0x1256CA: seeker_check (seeker.c:884)
==2211908== by 0x1366AC: timer_expired (timeout.c:62)
==2211908== by 0x1163D1: main (gossipd.c:1146)
==2211908== Address 0x4cafdf0 is 48 bytes inside a block of size 88 free'd
==2211908== at 0x48460C4: free (vg_replace_malloc.c:872)
==2211908== by 0x1805EA: del_tree (tal.c:421)
==2211908== by 0x1808BE: tal_free (tal.c:486)
==2211908== by 0x12AB25: destroy_dc_from_conn (daemon_conn.c:112)
==2211908== by 0x17FFDF: notify (tal.c:237)
==2211908== by 0x180519: del_tree (tal.c:402)
==2211908== by 0x1808BE: tal_free (tal.c:486)
==2211908== by 0x16EE9A: io_close (io.c:450)
==2211908== by 0x16ECA9: do_plan (io.c:401)
==2211908== by 0x16ED16: io_ready (io.c:417)
==2211908== by 0x1710B2: io_loop (poll.c:453)
==2211908== by 0x1163C5: main (gossipd.c:1144)
==2211908== Block was alloc'd at
==2211908== at 0x484384F: malloc (vg_replace_malloc.c:381)
==2211908== by 0x180064: allocate (tal.c:250)
==2211908== by 0x180634: tal_alloc_ (tal.c:428)
==2211908== by 0x12AB65: daemon_conn_new_ (daemon_conn.c:122)
==2211908== by 0x1155F4: gossip_init (gossipd.c:763)
==2211908== by 0x116014: recv_req (gossipd.c:999)
==2211908== by 0x12A828: handle_read (daemon_conn.c:31)
==2211908== by 0x16E09F: next_plan (io.c:59)
==2211908== by 0x16ECD4: do_plan (io.c:407)
==2211908== by 0x16ED16: io_ready (io.c:417)
==2211908== by 0x1710B2: io_loop (poll.c:453)
==2211908== by 0x1163C5: main (gossipd.c:1144)
==2211908==
```
This is the cheapest algo I came up with that simply checks that the
same `remote_addr` has been report by two different peers. Can be
improved in many ways:
- Check by connecting to a radonm peers in the network
- Check for more than two confirmations or a certain fraction
- ...
Changelog-Added: Send updated node_annoucement when two peers report the same remote_addr.