The spec (since d4bafcb67dcf1e4de4d16224ea4de6b543ae73bf in March
2020) requires that reply_channel_range be in order (and all
implementations did this anyway).
But when I tried this, I found that LND doesn't (always) obey this,
since don't divide on block boundaries. So we have to loosen the
constraints here a little.
We got rid of the old LND compat handling though, since everyone should
now be upgraded (there are CVEs out for older LNDs).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Removed: Support for receiving full gossip from ancient LND nodes.
Instead of a boutique message, use a "real" channel_announcement for
private channels (with fake sigs and pubkeys). This makes it far
easier for gossmap to handle local channels.
Backwards compatible update, since we update old stores.
We also fix devtools/dump-gossipstore to know about the tombstone markers.
Since we increment our channel_announce count for local channels now,
the stats in the tests changed too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Rename memleak_enter_allocations to memleak_find_allocations.
2. Unify scanning for pointers into memleak_remove_region / memleak_remove_pointer.
3. Document the functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This avoids overwriting the ones in git, and generally makes things neater.
We have convenience headers wire/peer_wire.h and wire/onion_wire.h to
avoid most #ifdefs: simply include those.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
See https://github.com/lightningnetwork/lightning-rfc/pull/767
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: channels now pruned after two weeks unless both peers refresh it (see lightning-rfc#767)
There were no channel updates in my log; because sendonion doesn't know the
actual node_ids or channel_ids, we can't tell gossipd what node/channel it was
so it can no longer remove them on PERM errors.
However, we can tell it the error message so it can apply the update.
Fixes: #3877
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will be used when we want to specify these in a route. But for now, they
only alter gossipd, which always sets them to NULL.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's almost always "their_features" and "our_features" respectively, so
make those names clear.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Turns out that unnecessary: all callers can access the feature_set,
so make it much more like a normal primitive.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Even without optimization, it's faster to walk all the channels than
ping another daemon and wait for the response.
Changelog-Changed: Forwarding messages is now much faster (less inter-daemon traffic)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The idea is that gossipd can give us the cupdate we need for an error, and
we wire things up so that we ask for it (async) just before we send the
error to the subdaemon.
I tried many other things, but they were all too high-risk.
1. We need to ask gossipd every time, since it produces these lazily
(in particular, it doesn't actually generate an offline update unless
the channel is used).
2. We can't do async calls in random places, since we'll end up with
an HTLC in limbo. What if another path tries to fail it at the same time?
3. This allows us to use a temporary_node_failure error, and upgrade it
when gossipd replies. This doesn't change any existing assumptions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a common thing to do, so create a macro.
Unfortunately, it still needs the type arg, because the paramter may
be const, and the return cannot be, and C doesn't have a general
"(-const)" cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Before this patch we used to send `double`s over the wire by just
copying them. This is not portable because the internal represenation
of a `double` is implementation specific.
Instead of this, multiply any floating-point numbers that come from
the outside (e.g. JSONs) by 1 million and round them to integers when
handling them.
* Introduce a new param_millionths() that expects a floating-point
number and returns it multipled by 1000000 as an integer.
* Replace param_double() and param_percent() with param_millionths()
* Previously the riskfactor would be allowed to be negative, which must
have been unintentional. This patch changes that to require a
non-negative number.
Changelog-None
I hadn't realized that lightningd asks gossipd every time we forward
a payment. But I'm going to abuse it here to get the latest channel_update,
otherwise (as lightningd takes over error message generation) lightningd
needs to do an async request at various painful points.
So have gossipd tell us the lastest update (stripped so compatible with
the strange in-onion-error format).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us do more flexible filtering in the next patch. But it also
keeps some weird logic out of gossipd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This prevents a gratuitous lookup of we get a late channel_announce,
but even better, it suppresses the "bad gossip" messages in case of
a late channel_update, which have plagued Travis (especially since we
got aggressive in pushing our own updates).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a better fix than doing it manually, which turned out
to do it in the wrong order (node_announcement followed by
channel_announcement) anyway.
Should fix many "Bad gossip" messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I had a report of a 0.7.2 user whose node hadn't appeared on 1ml. Their
node_announcement wasn't visible to my node, either.
I suspect this is a consequence of recent version reducing the amount of
gossip they send, as well as large nodes increasingly turning off gossip
altogether from some peers (as we do). We should ignore timestamp filters
for our own channels: the easiest way to do this is to push them out
directly from gossipd (other messages are sent via the store).
We change channeld to wrap the local channel_announcements: previously
we just handed it to gossipd as for any other gossip message we received
from our peer. Now gossipd knows to push it out, as it's local.
This interferes with the logic in tests/test_misc.py::test_htlc_send_timeout
which expects the node_announcement message last, so we generalize
that too.
[ Thanks to @trueptolmy for bugfix! ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is mainly an internal-only change, especially since we don't
offer any globalfeatures.
However, LND (as of next release) will offer global features, and also
expect option_static_remotekey to be a *global* feature. So we send
our (merged) feature bitset as both global and local in init, and fold
those bitsets together when we get an init msg.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We weren't supposed to do any gossiping until we were synced (and thus
knew blockheight), but our seeker_check() didn't wait for it! Move the
waiting all into seeker.c, so it can handle it all consistently.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It usually means we're missing something, but there's no way to ask what.
Simply start a broad scid probe.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We assume that the time for gossip propagation is < 10 minutes, so by
going back that far from last gossip we won't miss anything,
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We eliminate the "need peer" states and instead check if the
random_peer_softref has been cleared.
We can also unify our restart handlers for all these cases; even the
probe_scids case, by giving gossip credit for the scids as they come
in (at a discount, since scids are 8 bytes vs the ~200 bytes for
normal gossip messages).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of a linear array which is fairly inefficient if it turns out
we know nothing at all.
We remove the gossip_missing() call by changing the api to
remove_unknown_scid() to include a flag as to whether the scid turned
out to be real or not.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The seeker starts by asking a peer (the first peer!) for all gossip
since a minute before the modified time of the gossip store.
This algorithm is enhanced in successive patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we have to validate, there can be a delay (and peer might
vanish) between receiving the gossip and actually confirming it, hence
the use of softref.
We will use this information to check that the peers are making progress
as we start asking them for specific information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the modified-time of the file. We have to store it internally
since we overwrite the gossip file with compaction on startup.
This means the "are we behind on gossip?" heuristic is no longer inside
gossip_store.c, which is cleaner.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have a pointer to chainparams, that fails valgrind if we do anything
chain-specific before setting it.
Suggested-by: Rusty Russell <@rustyrussell>
We do this by keeping a current and an old map, and moving the current to old
every hour or 10,000 entries.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I was seeing some accidental pruning under load / Travis, and in
particular we stopped accepting channel_updates because they were 103
seconds old. But making it too long makes the prune test untenable,
so restore a separate flag that this test can use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The only real change is dump_gossip() used to call
maybe_create_next_scid_reply(), but now I've simply renamed
that to maybe_send_query_responses() and we call it directly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The first one means we don't discard channels just because we're not
synced, and the second is implied by the spec: don't accept
channel_announcement if the channel isn't 6 deep. Since LND defers in
such cases, we do too (unless it's newer than the current block, in
which case we simply discard). Otherwise there's a risk that a slow
node might discard valid gossip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This will let gossipd be more intelligent about gossiping before we're
synced, and also it might know how far behind we are.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's generally clearer to have simple hardcoded numbers with an
#if DEVELOPER around it, than apparent variables which aren't, really.
Interestingly, our pruning test was always kinda broken: we have to pass
two cycles, since l2 will refresh the channel once to avoid pruning.
Do the more obvious thing, and cut the network in half and check that
l1 and l3 time out.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make update_local_channel use a timer if it's too soon to make another
update.
1. Implement cupdate_different() which compares two updates.
2. make update_local_channel() take a single arg for timer usage.
3. Set timestamp of non-disable update back 5 minutes, so we can
always generate a disable update if we need to.
4. Make update_local_channel() itself do the "unchanged update" suppression.
gossipd: clean up local channel updates.
5. Keep pointer to the current timer so we override any old updates with
a new one, to avoid a race.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Write helpers to split it into non-timestamp, non-signature parts,
and simply compare those. We extract a helper to do channel_update, too.
This is more generic than our previous approach, and simpler.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We've been slack, but it's going to be important for testing
ratelimiting. And it currently has a minor memory leak.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than reaching into data structures, let them register their own
callbacks. This avoids us having to expose "memleak_remove_xxx"
functions, and call them manually.
Under the hood, this is done by having a specially-named tal child of
the thing we want to assist, containing the callback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fortunately, again, only happens with EXPERIMENTAL_FEATURES.
If the query causes us not to actually send anything, we won't
get called again. This can validly happen if they only asked for
the node_announcements, for example.
(Found by protocol tests).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our "are we finished?" logic was wrong: it tested if there are no more
node_announcements, but it's possible that there were no node_announcements
for either end of the channel whose information we sent.
This is actually quite unusual on the real network: looking at mainnet
statis from last May, 4301 of 4337 nodes have node_announcements.
However, with query flags it's much more likely, since they might not
ask for node announcements at all.
(Found by gossip protocol tests)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These both allow us to reproduce the test vectors in the next patch. But
using Z_DEFAULT_COMPRESSION is a reasonable idea anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Make the TLV element a simple array. This is a bit neater, in fact, and
makes the test vectors in that 557 PR work.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In fact, we always generate them, we only send them if asked. And we set
the flags to 0 if not --enable-experimental-features, so we never send in
that case.
Generating checksums involves pulling the channel_update from the
gossip_store, which is suboptimal: there's a FIXME to store the
checksum in memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We're about to use the for gossip extended info too, which *don't* put
the encoding byte at the beginning of the data stream. So this removes
some "scids" from function names and separates out the "prepend a byte"
case from the "external encoding_type" case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These indicate what fields we are to return. If there's now TLV, or we
haven't got --enable-experimental-features, it's set to all 1s so behaviour
is unchanged.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
updates the bolt version to 6639cef095a2ecc7b8f0c48c6e7f2f906fbfbc58.
this requires us to use the new bolt parser at generate-bolt.py
and updates to all of the type specifications (ie. from u8 -> byte)
We hit the timestamp assert on #2750; it shouldn't happen, but crashing
doesn't leave much information.
Reported-by: @m-schmook
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We seek a certain number of peers at each level of gossip; 3 "flood"
if we're missing gossip, 2 at 24 hours past to catch recent gossip, and
8 with current gossip. The rest are given a filter which causes them
not to gossip to us at all.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The first sign that we're missing gossip is that we get a channel_update
for an unknown channel. The peer might be wrong (or lying), but if it turns
out to be a real channel, we were definitely missing something.
This patch does two things: queries when we get an unknown channel_update,
and then notes that a channel_announcement was from such an update when
it's finally processed.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In particular, we'll need to know the short_channel_id if a
channel_update is unknown (implies we're missing a channel), and whether
processing a pending channel_announcement was successful (implies that
the channel was real).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Up until now we only generated these in dev mode for testing. Hoist
into common code, turn counter into a flag (we're only allowed one!)
and note if query is internal or not.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means there's now a semantic difference between the default `fromid`
and setting `fromid` explicitly to our own node_id. In the default case,
it means we don't charge ourselves fees on the route.
This means we can spend the full channel balance.
We still want to consider the pricing of local channels, however:
there's a *reason* to discount one over another, and that is to bias
things. So we add the first-hop fee to the *risk* value instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This clarifies things a fair bit: we simply add and remove from the
gossip_store directly.
Before this series: (--disable-developer, -Og)
store_load_msec:20669-20902(20822.2+/-82)
vsz_kb:439704-439712(439706+/-3.2)
listnodes_sec:0.890000-1.000000(0.92+/-0.04)
listchannels_sec:11.960000-13.380000(12.576+/-0.49)
routing_sec:3.070000-5.970000(4.814+/-1.2)
peer_write_all_sec:28.490000-30.580000(29.532+/-0.78)
After: (--disable-developer, -Og)
store_load_msec:19722-20124(19921.6+/-1.4e+02)
vsz_kb:288320
listnodes_sec:0.860000-0.980000(0.912+/-0.056)
listchannels_sec:10.790000-12.260000(11.65+/-0.5)
routing_sec:2.540000-4.950000(4.262+/-0.88)
peer_write_all_sec:17.570000-19.500000(18.048+/-0.73)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we intercept the peer's gossip_timestamp_filter request
in the per-peer subdaemon itself. The rest of the semantics are fairly
simple however.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project. But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.
This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.
We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They already send *us* gossip messages, so they have to be distinct anyway.
Why make us both do extra work?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use the high bit of the length field: this way we can still check
that the checksums are valid on deleted fields.
Once this is done, serially reading the gossip_store file will result
in a complete, ordered, minimal gossip broadcast. Also, the horrible
corner case where we might try to delete things from the store during
load time is completely gone: we only load non-deleted things.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Each destructor2 costs 40 bytes, and struct chan is only 120 bytes. So
this drops our memory usage quite a bit:
MCP bench results change:
-vsz_kb:580004-580016(580006+/-4.8)
+vsz_kb:533148
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This has two effects: most importantly, it avoids the problem where
lightningd creates a 800MB JSON blob in response to listchannels,
which causes OOM on the Raspberry Pi (our previous max allocation was
832MB). This is because lightning-cli can start draining the JSON
while we're filling the buffer, so we end up with a max allocation of
68MB.
But despite being less efficient (multiple queries to gossipd), it
actually speeds things up due to the parallelism:
MCP with -O3 -flto before vs after:
-listchannels_sec:8.980000-9.330000(9.206+/-0.14)
+listchannels_sec:7.500000-7.830000(7.656+/-0.11)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now have a test blockchain for MCP which has the correct channels,
so this is not needed.
Also fix a benchmark script bug where 'mv "$DIR"/log
"$DIR"/log.old.$$' would fail if you log didn't exist from a previous run.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of reading the store ourselves, we can just send them an
offset. This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).
MCP bench results:
store_load_msec:28509-31001(29206.6+/-9.4e+02)
vsz_kb:580004-580016(580006+/-4.8)
store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
listnodes_sec:1.790000-1.880000(1.83+/-0.032)
listchannels_sec:21.180000-21.950000(21.476+/-0.27)
routing_sec:2.210000-11.160000(7.126+/-3.1)
peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)
Signficant savings in streaming gossip:
-peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
+peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to store the channel capacity for channel_announcement: hand it
in directly rather than having the gossip_store code do a lookup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We know exactly how many there will be, so allocate an entire array up-front.
-listnodes_sec:2.540000-2.610000(2.584+/-0.029)
+listnodes_sec:2.100000-2.170000(2.118+/-0.026)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When we compact the store, we need to adjust the broadast index for
peers so they know where they're up to.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This requires some trickiness when we want to re-add unannounced channels
to the store after compaction, so we extract a common "copy_message" to
transfer from old store to new.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:36034-37853(37109.8+/-5.9e+02)
vsz_kb:577456
store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
listnodes_sec:1.250000-1.480000(1.364+/-0.09)
listchannels_sec:30.820000-31.480000(31.068+/-0.24)
routing_sec:26.940000-27.990000(27.616+/-0.39)
peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1202316
+vsz_kb:577456
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The txout_script field is unused; the local_disable only applies to
the handful of local channels, so move that into a hash table.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:39207-45089(41374.6+/-2.2e+03)
vsz_kb:1202316
store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
listnodes_sec:1.290000-3.790000(1.938+/-0.93)
listchannels_sec:30.190000-32.120000(31.31+/-0.69)
routing_sec:28.220000-31.340000(29.314+/-1.2)
peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35107-37944(36686+/-1e+03)
+store_load_msec:39207-45089(41374.6+/-2.2e+03)
-vsz_kb:1218036
+vsz_kb:1202316
-listchannels_sec:28.510000-30.270000(29.6+/-0.6)
+listchannels_sec:30.190000-32.120000(31.31+/-0.69)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used to have a `struct chan` while we're waiting for an update; now we
keep that internally. So a `struct chan` without a channel_announcement
in the store is private, and other is public.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reload them from disk if they do listnodes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35390-38659(37336.4+/-1.3e+03)
vsz_kb:1780516
store_rewrite_sec:13.800000-16.800000(15.02+/-0.98)
listnodes_sec:1.280000-1.530000(1.382+/-0.096)
listchannels_sec:28.700000-30.440000(29.34+/-0.68)
routing_sec:30.120000-31.080000(30.526+/-0.35)
peer_write_all_sec:65.910000-76.850000(69.462+/-4.1)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:1792996
+vsz_kb:1780516
-listnodes_sec:1.030000-1.120000(1.068+/-0.032)
+listnodes_sec:1.280000-1.530000(1.382+/-0.096)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we need the payload, pull it from the gossip store.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:30189-52561(39416.4+/-8.8e+03)
vsz_kb:1812904
store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
listnodes_sec:1.120000-1.230000(1.176+/-0.044)
listchannels_sec:38.900000-50.580000(44.716+/-3.9)
routing_sec:45.080000-48.160000(46.814+/-1.1)
peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2288784
+vsz_kb:1812904
-store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
+store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
-listnodes_sec:0.750000-0.850000(0.794+/-0.042)
+listnodes_sec:1.120000-1.230000(1.176+/-0.044)
-listchannels_sec:30.740000-31.760000(31.096+/-0.35)
+listchannels_sec:38.900000-50.580000(44.716+/-3.9)
-routing_sec:29.600000-33.560000(30.472+/-1.5)
+routing_sec:45.080000-48.160000(46.814+/-1.1)
-peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
+peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of an arbitrary counter, we can use the file offset for our
partial ordering, removing a field. It takes some care when we compact
the store, however, as this field changes.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34271-35283(34789.6+/-3.3e+02)
vsz_kb:2288784
store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
listnodes_sec:0.750000-0.850000(0.794+/-0.042)
listchannels_sec:30.740000-31.760000(31.096+/-0.35)
routing_sec:29.600000-33.560000(30.472+/-1.5)
peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:35685-38538(37090.4+/-9.1e+02)
+store_load_msec:34271-35283(34789.6+/-3.3e+02)
-vsz_kb:2288768
+vsz_kb:2288784
-peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
+peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is more compact, but also required once we replace the arbitrary
"index" with an actual offset into the gossip store. That will let us
remove the in-memory variants entirely.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35685-38538(37090.4+/-9.1e+02)
vsz_kb:2288768
store_rewrite_sec:35.530000-41.230000(37.904+/-2.3)
listnodes_sec:0.720000-0.810000(0.762+/-0.041)
listchannels_sec:30.750000-35.990000(32.704+/-2)
routing_sec:29.570000-34.010000(31.374+/-1.8)
peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
MCP notable changes from previous patch (>1 stddev):
-vsz_kb:2621808
+vsz_kb:2288768
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used an s64 so we could use -1 and save a check, but that's just
silly as we have adjacent non-u64 fields: wastes 7 bytes per node
and 16 per channel.
Interestingly, this seemed to make us a little slower for some reason.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:35569-38776(37169.8+/-1.2e+03)
vsz_kb:2621808
store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
listnodes_sec:0.740000-0.800000(0.768+/-0.023)
listchannels_sec:29.820000-32.730000(30.972+/-0.99)
routing_sec:30.110000-30.590000(30.346+/-0.18)
peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:32825-36365(34615.6+/-1.1e+03)
+store_load_msec:35569-38776(37169.8+/-1.2e+03)
-vsz_kb:2637488
+vsz_kb:2621808
-store_rewrite_sec:35.150000-36.200000(35.59+/-0.4)
+store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
-listnodes_sec:0.590000-0.710000(0.682+/-0.046)
+listnodes_sec:0.740000-0.800000(0.768+/-0.023)
-peer_write_all_sec:49.020000-52.890000(50.376+/-1.5)
+peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can save significant space by combining both sides: so much that we
can reduce the WIRE_LEN_LIMIT to something sane again.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34467-36764(35517.8+/-7.7e+02)
vsz_kb:2637488
store_rewrite_sec:35.310000-36.580000(35.816+/-0.44)
listnodes_sec:1.140000-2.780000(1.596+/-0.6)
listchannels_sec:55.390000-58.110000(56.998+/-0.99)
routing_sec:30.330000-30.920000(30.642+/-0.19)
peer_write_all_sec:50.640000-53.360000(51.822+/-0.91)
MCP notable changes from previous patch (>1 stddev):
-store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
+store_rewrite_sec:35.310000-36.580000(35.816+/-0.44)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't turn them to/from pubkeys implicitly. This means nodeids in the store
don't get converted, but bitcoin keys still do.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:33934-35251(34531.4+/-5e+02)
vsz_kb:2637488
store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
listnodes_sec:1.020000-1.290000(1.146+/-0.086)
listchannels_sec:51.110000-58.240000(54.826+/-2.5)
routing_sec:30.000000-33.320000(30.726+/-1.3)
peer_write_all_sec:50.370000-52.970000(51.646+/-1.1)
MCP notable changes from previous patch (>1 stddev):
-store_load_msec:46184-47474(46673.4+/-4.5e+02)
+store_load_msec:33934-35251(34531.4+/-5e+02)
-vsz_kb:2638880
+vsz_kb:2637488
-store_rewrite_sec:46.750000-48.280000(47.512+/-0.51)
+store_rewrite_sec:34.720000-35.130000(34.94+/-0.14)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
I tried to just do gossipd, but it was uncontainable, so this ended up being
a complete sweep.
We didn't get much space saving in gossipd, even though we should save
24 bytes per node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes the next step easier.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:45791-46917(46330.4+/-3.6e+02)
vsz_kb:2641316
store_rewrite_sec:47.040000-48.720000(47.684+/-0.57)
listnodes_sec:1.140000-1.340000(1.2+/-0.072)
listchannels_sec:50.970000-54.250000(52.698+/-1.3)
routing_sec:29.950000-31.010000(30.332+/-0.37)
peer_write_all_sec:51.570000-52.970000(52.1+/-0.54)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This lets us benchmark without a valid blockchain.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Header from folded patch 'fixup!_gossipd__dev_option_to_allow_unknown_channels.patch':
fixup! gossipd: dev option to allow unknown channels.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For giant nodes, it seems we spend a lot of time memmoving this array.
Normally we'd go for a linked list, but that's actually hard: each
channel has two nodes, so needs two embedded list pointers, and when
iterating there's no good way to figure out which embedded pointer
we'd be using.
So we (ab)use htable; we don't really need an index, but it's good for
cache-friendly iteration (our main operation). We can actually change
to a hybrid later to avoid the extra allocation for small nodes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Basically we tell it that every field ending in '_msat' is a struct
amount_msat, and 'satoshis' is an amount_sat. The exceptions are
channel_update's fee_base_msat which is a u32, and
final_incorrect_htlc_amount's incoming_htlc_amt which is also a
'struct amount_msat'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a side-effect of using amount_msat in gossipd/routing.c, we explicitly
handle overflows and don't need to pre-prune ridiculous-fee channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We used a u16, and a 1000 multiplier, which meant we wrapped at
riskfactor 66. We also never undid the multiplier, so we ended up
applying 1000x the riskfactor they specified.
This changes us to pass the riskfactor with a 1M multiplier. The next
patch changes the definition of riskfactor to be more useful.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have a seed, which is for (future!) unit testing consistency. This
makes it change every time, so our pay_direct_test is more useful.
I tried restarting the noed around the loop, but it tended to fail
rebinding to the same port for some reason?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As a general rule, lightningd shouldn't parse user packets. We move the
parsing into gossipd, and have it respond only to permanent failures.
Note that we should *not* unconditionally remove a channel on
WIRE_INVALID_ONION_HMAC, as this can be triggered (and we do!) by
feeding sendpay a route with an incorrect pubkey.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had a bug 0ba547ee10 caused by
short_channel_id overflow. If we'd caught this, we'd have terminated
the peer instead of crashing, so add appropriate checks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Don't do this:
(gdb) bt
#0 0x00007f37ae667c40 in ?? () from /lib/x86_64-linux-gnu/libz.so.1
#1 0x00007f37ae668b38 in ?? () from /lib/x86_64-linux-gnu/libz.so.1
#2 0x00007f37ae669907 in deflate () from /lib/x86_64-linux-gnu/libz.so.1
#3 0x00007f37ae674c65 in compress2 () from /lib/x86_64-linux-gnu/libz.so.1
#4 0x000000000040cfe3 in zencode_scids (ctx=0xc1f118, scids=0x2599bc49 "\a\325{", len=176320) at gossipd/gossipd.c:218
#5 0x000000000040d0b3 in encode_short_channel_ids_end (encoded=0x7fff8f98d9f0, max_bytes=65490) at gossipd/gossipd.c:236
#6 0x000000000040dd28 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290511, number_of_blocks=8) at gossipd/gossipd.c:576
#7 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290511, number_of_blocks=16) at gossipd/gossipd.c:595
#8 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290495, number_of_blocks=32) at gossipd/gossipd.c:596
#9 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290495, number_of_blocks=64) at gossipd/gossipd.c:595
#10 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=128) at gossipd/gossipd.c:596
#11 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=256) at gossipd/gossipd.c:595
#12 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=512) at gossipd/gossipd.c:595
#13 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=1024) at gossipd/gossipd.c:595
#14 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=2047) at gossipd/gossipd.c:596
#15 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=4095) at gossipd/gossipd.c:595
#16 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=8191) at gossipd/gossipd.c:595
#17 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=16382) at gossipd/gossipd.c:595
#18 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=32764) at gossipd/gossipd.c:595
#19 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=65528) at gossipd/gossipd.c:595
#20 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=131056) at gossipd/gossipd.c:595
#21 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=262112) at gossipd/gossipd.c:595
#22 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=524225) at gossipd/gossipd.c:595
#23 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=1048450) at gossipd/gossipd.c:595
#24 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=2096900) at gossipd/gossipd.c:595
#25 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=4193801) at gossipd/gossipd.c:595
#26 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=8387603) at gossipd/gossipd.c:595
#27 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=16775207) at gossipd/gossipd.c:595
#28 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=33550414) at gossipd/gossipd.c:596
#29 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=67100829) at gossipd/gossipd.c:595
#30 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=134201659) at gossipd/gossipd.c:595
#31 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=268403318) at gossipd/gossipd.c:595
#32 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=536806636) at gossipd/gossipd.c:595
#33 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=1073613273) at gossipd/gossipd.c:595
#34 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=2147226547) at gossipd/gossipd.c:595
#35 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=4294453094) at gossipd/gossipd.c:595
#36 0x000000000040df26 in handle_query_channel_range (peer=0x3868fc8, msg=0x37e0678 "\001\ao\342\214\n\266\361\263r\301\246\242F\256c\367O\223\036\203e\341Z\b\234h\326\031") at gossipd/gossipd.c:625
The cause was that converting a block number to an scid truncates it
at 24 bits. When we look through the index from (truncated number) to
(real end number) we get every channel, which is too large to encode,
so we iterate again.
This fixes both that problem, and also the issue that we'd end up
dividing into many empty sections until we get to the highest block
number. Instead, we just tack the empty blocks on to then end of the
final query.
(My initial version requested 0xFFFFFFFE blocks, but the dev code
which records what blocks were returned can't make a bitmap that big
on 32 bit).
Reported-by: George Vaccaro
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently only used by gossipd for channel elimination.
Also print them in canonical form (/[01]), so tests need to be
changed.
Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
seed isn't very useful at this level: I've left it in routing.c
because it might be useful for detailed testing. Pretty sure it's unused,
so I simply removed it.
The fuzzpercent is documented to default at 5%, but actually was 75%.
Fix that too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Christian and I both unwittingly used it in form:
*tal_arr_expand(&x) = tal(x, ...)
Since '=' isn't a sequence point, the compiler can (and does!) cache
the value of x, handing it to tal *after* tal_arr_expand() moves it
due to tal_resize().
The new version is somewhat less convenient to use, but doesn't have
this problem, since the assignment is always evaluated after the
resize.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is mainly just copying over the copy-editing from the
lightning-rfc repository.
[ Split to just perform changes prior to the UNKNOWN_PAYMENT_HASH change --RR ]
Signed-off-by: Christian Decker <decker.christian@gmail.com>
Reported-by: Rusty Russell <@rustyrussell>