Commit Graph

820 Commits

Author SHA1 Message Date
Rusty Russell c07dff21dc gossipd: generalize query_channel_range to use a callback.
This means we'll be able to call it for internal reasons, not just dev testing
as now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-30 07:08:07 +00:00
Rusty Russell 4bf0bc1f28 gossipd: age txout_failures map.
We do this by keeping a current and an old map, and moving the current to old
every hour or 10,000 entries.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-27 02:32:53 +00:00
Rusty Russell 1450a13c1f gossipd: don't expose scids of unannounced channels.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-27 02:32:53 +00:00
Rusty Russell 722b4942ed common: rename decode_short_channel_ids.{c,h} to decode_array.{c.h}
This encoding scheme is no longer just used for short_channel_ids, so make
the names more generic.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-27 02:32:53 +00:00
Rusty Russell aa9024db51 gossipd: fix memleak false-positive
2019-09-26T02:00:47.832Z DEBUG lightning_hsmd(89822): Client: Received message 33 from client'
2019-09-26T02:00:47.837Z **BROKEN** lightning_gossipd(89828): MEMLEAK: 0x55eddc5d1fd8
2019-09-26T02:00:47.838Z **BROKEN** lightning_gossipd(89828):   label=gossipd/routing.c:1579:struct unupdated_channel
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):   backtrace:
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     ccan/ccan/tal/tal.c:437 (tal_alloc_)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     gossipd/routing.c:1579 (routing_add_channel_announcement)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     gossipd/routing.c:1867 (handle_pending_cannouncement)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     gossipd/gossipd.c:1543 (handle_txout_reply)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     gossipd/gossipd.c:1726 (recv_req)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     common/daemon_conn.c:31 (handle_read)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     ccan/ccan/io/io.c:59 (next_plan)
2019-09-26T02:00:47.838Z DEBUG lightning_gossipd(89828):     ccan/ccan/io/io.c:407 (do_plan)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-27 00:01:34 +00:00
Rusty Russell d24c850899 gossipd: restore a flag for fast pruning
I was seeing some accidental pruning under load / Travis, and in
particular we stopped accepting channel_updates because they were 103
seconds old.  But making it too long makes the prune test untenable,
so restore a separate flag that this test can use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-27 00:01:34 +00:00
Rusty Russell 2b47922ea5 gossipd: move query functions into their own file.
The only real change is dump_gossip() used to call
maybe_create_next_scid_reply(), but now I've simply renamed
that to maybe_send_query_responses() and we call it directly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-25 04:01:56 +00:00
Rusty Russell 38124ec287 gossipd: don't ask peers for gossip until we're synced with bitcoind.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-25 04:01:56 +00:00
Rusty Russell a071a754b3 gossipd: place limit on pending announcements.
Now we queue them, we should place a limit.  It's not the worst thing in
the world if we discard them (we'll catch up eventually), but we should
try not to in case we're just a bit behind.

Our behaviour here is also O(n^2) so we don't want a massive queue
anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-25 04:01:56 +00:00
Rusty Russell fd2d74aa9b gossipd: defer asking about txouts until we're synced or they're 6 deep.
The first one means we don't discard channels just because we're not
synced, and the second is implied by the spec: don't accept
channel_announcement if the channel isn't 6 deep.  Since LND defers in
such cases, we do too (unless it's newer than the current block, in
which case we simply discard).  Otherwise there's a risk that a slow
node might discard valid gossip.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-25 04:01:56 +00:00
Rusty Russell 2a74d53841 Move gossip_constants.h into common/
Turns out we weren't checking the BOLT comments before, so they
needed an overhaul.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-25 04:01:56 +00:00
Rusty Russell 6f9c5f2936 gossipd: get fed the blockheight from lightningd when we know it.
This will let gossipd be more intelligent about gossiping before we're
synced, and also it might know how far behind we are.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-25 04:01:56 +00:00
trueptolemy d8dce6e61f cleanup: Use `u32` as the type of `max_hops` in `gossipd` 2019-09-24 16:01:24 +02:00
Rusty Russell b55ff34f93 gossipd: fix corner case where gossip msg too old after pending delay.
Happened under Travis with --dev-fast-gossip (90 second prune time), but can
happen anyway if gossip is almost 2 weeks old when we receive it:

2019-09-20T19:16:51.367Z DEBUG lightning_gossipd(20972): Received node_announcement for node 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59
2019-09-20T19:16:51.376Z DEBUG lightning_gossipd(20972): Ignoring node_announcement timestamp 1569006918 for 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59
2019-09-20T19:16:51.669Z **BROKEN** lightning_gossipd(20972): pending node_announcement 01013094af771d60f4de69bb39ce045e4edf4a06fe6c80078dfa4fab58ab5617d6ad4fa34b6d3437380db0a8293cea348bbc77f714ef71fcd8515bfc82336667441f00005d852546022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59022d2253494c454e544152544953542d633961313734610000000000000000000000000000 malformed? (version c9a174a)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-22 20:56:11 +02:00
Rusty Russell 27790832a5 gossipd: gossip_queries_ex is not longer experimental.
The master spec has some typos which make it not parse, so I created
a PR and generated the CSV from that:

https://github.com/lightningnetwork/lightning-rfc/pull/673

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-22 01:17:11 +00:00
Rusty Russell 895e552475 BOLT: update to master with gossip_queries_ex.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-22 01:17:11 +00:00
Rusty Russell e5564173e7 BOLT: update to cover minor changes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-22 01:17:11 +00:00
Rusty Russell 6a8d18c7e3 gossipd: naming cleanups.
Suggested-by: @cdecker.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 39c9dcbafc ratelimit: adjust based on --dev-fast-gossip, test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 147eaced2e developer: consolidiate gossip timing options into one --dev-fast-gossip.
It's generally clearer to have simple hardcoded numbers with an
#if DEVELOPER around it, than apparent variables which aren't, really.

Interestingly, our pruning test was always kinda broken: we have to pass
two cycles, since l2 will refresh the channel once to avoid pruning.

Do the more obvious thing, and cut the network in half and check that
l1 and l3 time out.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 8139164aa0 gossipd: disallow far future (+1 day) or far past (2 weeks) timestamps.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 76860683aa gossipd: only allow one channel_update per direction per day.
And similar for node_announcement.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell a92ead48bf gossipd: ignore redundant channel_update and node_announcement.
If you send a message which simply changes timestamp and signature, we
drop it.  You shouldn't be doing that, and the door to ignoring them
was opened by by option_gossip_query_ex, which would allow clients to
ignore updates with the same checksum.

This is more aggressive at reducing spam messages, but we allow refreshes
(to be conservative, we allow them even when 1/2 of the way through the
refresh period).

I dropped the now-unnecessary sleep from test_gossip_pruning, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 46e0f1efcc gossipd: refresh every 13 days, not every 7.
One day is plenty of time to propagate the update.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 06afb408d8 gossipd: bias lower bit of timestamp to ensure alternation.
This is useful for various "partial timestamp" forms of propagation
in future, esp. minisketch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 21a6d502db gossipd: move gossip message generation into its own file.
gossipd.c is doing too many things: this is a start.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 0bab2580fc gossipd: clean up local channel updates.
Make update_local_channel use a timer if it's too soon to make another
update.

1. Implement cupdate_different() which compares two updates.
2. make update_local_channel() take a single arg for timer usage.
3. Set timestamp of non-disable update back 5 minutes, so we can
   always generate a disable update if we need to.
4. Make update_local_channel() itself do the "unchanged update" suppression.
   gossipd: clean up local channel updates.
5. Keep pointer to the current timer so we override any old updates with
   a new one, to avoid a race.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell e1c431d278 gossipd: use local_chan_map more.
We can look up local channels directly now, which offers simplifcations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 27d9b75456 gossipd: add shadow structure for local chans.
Normally we'd put a pointer into struct half_chan for local
information, but it would be NULL on 99.99% of nodes.  Instead, keep a
separate hash table.

This immediately subsumes the previous "map of local-disabled
channels", and will be enhanced further.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 70c4ac6d74 gossipd: suppress our own too-close node_announcement messages.
Never make them less than gossip_min_interval apart.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 178baeba6c gossipd: get gossip_min_interval from lightningd.
Default is 5 x gossip interval == 5 minutes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 4cfd0524eb gossipd: simplify duplicate node_announcement check.
Write helpers to split it into non-timestamp, non-signature parts,
and simply compare those.  We extract a helper to do channel_update, too.

This is more generic than our previous approach, and simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
Rusty Russell 5ddd7866e4 gossipd: make create_node_announcement const-correct.
sig is only non-const so we can override if NULL, but talz helps
us here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-20 06:55:00 +00:00
trueptolemy 5361a5d059 JSON-API: `getroute` now also support `exclude` nodes 2019-09-16 12:22:06 +08:00
trueptolemy 090a43fd3d gossip: Add the `struct exclude_entry` and `enum exclude_entry_type` 2019-09-16 12:22:06 +08:00
Rusty Russell a46e880f1d gossipd: in DEVELOPER mode, catch missing free_chan()
For memory-usage reasons, struct chan doesn't use a tal destructor, in
favor of us calling free_chan in the right places.

In DEVELOPER mode, we should check that is the case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-12 05:11:56 +00:00
Rusty Russell 91072f56b0 developer: add 'dev-gossip-set-time' call to manipulate gossipd's time.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-12 05:11:56 +00:00
Rusty Russell 768d293149 gossipd: don't get upset if we can't add channel_update.
In particular, the timestamp might be wrong once we start checking that.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-12 05:11:56 +00:00
Rusty Russell 2577ad87d5 gossipd: use gossip_time_now() everywhere.
We've been slack, but it's going to be important for testing
ratelimiting.  And it currently has a minor memory leak.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-12 05:11:56 +00:00
lisa neigut fe6c7f8f80 gossip queries: patch up valgrind errors in tests
These were giving me valgrind errors locally; fixed now.
2019-09-11 23:56:27 +00:00
Rusty Russell afbed94a6c gossipd: work around missing pwritev().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-11 05:58:36 +00:00
darosior 0b0ad4c22d transition from status_trace() to status_debug 2019-09-10 02:02:51 +00:00
darosior ea6c95b2b3 gossipd: don't ignore wrong chain in 'query_channel_range'
Give a NULL reply with the 'complete' flag to 0 instead
2019-09-10 02:02:51 +00:00
darosior 9be28fe40f daemons tour: minor typos correction 2019-09-10 02:02:51 +00:00
Rusty Russell c99906a9a9 per-peer-daemons: tie in gossip filter.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-06 14:35:01 +02:00
Rusty Russell aca2e4f722 common/memleak: add dynamic hooks for assisting memleak.
Rather than reaching into data structures, let them register their own
callbacks.  This avoids us having to expose "memleak_remove_xxx"
functions, and call them manually.

Under the hood, this is done by having a specially-named tal child of
the thing we want to assist, containing the callback.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-06 14:35:01 +02:00
Rusty Russell 837e6232c3 common: reduce header differences for DEVELOPER vs non-DEVELOPER.
`make update-mocks` is usually run in DEVELOPER mode, but then it includes
definitions for functions which aren't declared in non-DEVELOPER mode.

We hacked this in a few places, but it's fragile, and worst, now we
have EXPERIMENTAL_FEATURES as well, it's complex.

Instead, declare developer-only functions (but don't define them).
This is a bit more awkward if you accidentally use one in
non-DEVELOPER code (link error rather than compile error), but makes
autogenerating test mocks much easier.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-06 14:35:01 +02:00
Rusty Russell acf3952acc JSON: remove handling of pre-Adelaide (B:T:N) short_channel_ids.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-09-06 14:19:14 +02:00
Rusty Russell 02b9b7f6e6 tests: update mocks for --enable-experimental-features builds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-29 15:51:36 +02:00
Rusty Russell 28185c397c gossipd: fix gossip send in case query_flags cause no output.
Fortunately, again, only happens with EXPERIMENTAL_FEATURES.

If the query causes us not to actually send anything, we won't
get called again.  This can validly happen if they only asked for
the node_announcements, for example.

(Found by protocol tests).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-27 12:35:25 +02:00
Rusty Russell 855dff704c gossipd: test crc32 routines using test vectors from PR.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-27 12:35:25 +02:00
Rusty Russell d1a1592cc8 gossipd: fix calculation of crc32 of update.
Currently EXPERIMENTAL_FEATURES only, fortunately.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-27 12:35:25 +02:00
Rusty Russell 0ec8304901 gossipd: fix premature towire_reply_short_channel_ids_end if no node_announcement.
Our "are we finished?" logic was wrong: it tested if there are no more
node_announcements, but it's possible that there were no node_announcements
for either end of the channel whose information we sent.

This is actually quite unusual on the real network: looking at mainnet
statis from last May, 4301 of 4337 nodes have node_announcements.

However, with query flags it's much more likely, since they might not
ask for node announcements at all.

(Found by gossip protocol tests)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-26 23:09:00 +00:00
Rusty Russell 2f1e116510 gossipd: use htable_count() rather than reaching into htable struct.
Now ccan/htable provides the helper, let's use it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-26 08:44:22 +00:00
Rusty Russell 51541f53d8 gossipd: test vectors for https://github.com/lightningnetwork/lightning-rfc/pull/557
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell 4de47f6db5 gossipd: use default zlib compression, hack for zlib expansion.
These both allow us to reproduce the test vectors in the next patch.  But
using Z_DEFAULT_COMPRESSION is a reasonable idea anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell f6cf4bf62a spec: remove encoding byte from checksums.
Make the TLV element a simple array.  This is a bit neater, in fact, and
makes the test vectors in that 557 PR work.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell 8abd850d3c gossipd: append timestamps & checksums to reply_channel_range if asked (EXPERIMENTAL)
In fact, we always generate them, we only send them if asked.  And we set
the flags to 0 if not --enable-experimental-features, so we never send in
that case.

Generating checksums involves pulling the channel_update from the
gossip_store, which is suboptimal: there's a FIXME to store the
checksum in memory.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell c7853197ae gossipd: generalize encoding functions
We're about to use the for gossip extended info too, which *don't* put
the encoding byte at the beginning of the data stream.  So this removes
some "scids" from function names and separates out the "prepend a byte"
case from the "external encoding_type" case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell 0de11da5e4 gossipd: decode and obey query_short_channel_ids's TLV query_flags (EXPERIMENTAL)
These indicate what fields we are to return.  If there's now TLV, or we
haven't got --enable-experimental-features, it's set to all 1s so behaviour
is unchanged.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell d2030539e1 EXPERIMENTAL: pull in PR 557 (with minor fixes): range query support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-10 02:48:34 +00:00
Rusty Russell f9ecc76d99 gossipd: check that we don't try to access a deleted gossip entry.
We ignored this before, which meant that the DEVELOPER-mode check that we
delete the correct record didn't check that it wasn't already deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-09 08:58:05 +02:00
Rusty Russell f57f068592 gossipd: don't use O_APPEND on the gossip_store.
We always know the length, so we don't need it.  It causes much extra work
when we want to delete a record, which I suspect may cause issues amongst
some users who've been seeing gossip_store corruption.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-08-03 12:50:51 +02:00
ZmnSCPxj 3e74ca4b86 gossipd/routing.c: Correctly handle a duplicated entry in `exclude` of `getroute`. 2019-08-02 16:06:15 +02:00
Rusty Russell cc70b9c4ec wire: use common/bigsize routines
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-31 23:25:59 +00:00
Rusty Russell 6bb8525e5d gossipd: fix crash when we prune old, un-updated channel announcements.
We added a random channel to the list, but we can just free it immediately
(since traversal of a uintmap isn't altered by deletion).

This was introduced in d1f43d993a where we explicitly call free_chan
rather than relying on destructors.

Fixes: #2837
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-28 14:15:32 +02:00
lisa neigut 32eaae0cb9 wire-gen: move in-house wire delcarations to new format
tidying things up!
2019-07-24 06:31:46 +00:00
Rusty Russell b3215a866b gossipd: fix inverted test in debug print.
==1503== Use of uninitialised value of size 8
==1503==    at 0x566786B: _itoa_word (_itoa.c:179)
==1503==    by 0x566AF0D: vfprintf (vfprintf.c:1642)
==1503==    by 0x569790F: vsnprintf (vsnprintf.c:114)
==1503==    by 0x156CCB: do_vfmt (str.c:66)
==1503==    by 0x156DB1: tal_vfmt_ (str.c:92)
==1503==    by 0x1289CD: status_vfmt (status.c:141)
==1503==    by 0x128AAC: status_fmt (status.c:151)
==1503==    by 0x118E05: route_prune (routing.c:2495)
==1503==    by 0x11DE2D: gossip_refresh_network (gossipd.c:1997)
==1503==    by 0x1292B8: timer_expired (timeout.c:39)
==1503==    by 0x12088C: main (gossipd.c:3075)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-17 20:16:55 -05:00
lisa neigut 5c07afac7d bolt: update to BOLT spec changes (extract format + type specifications)
updates the bolt version to 6639cef095a2ecc7b8f0c48c6e7f2f906fbfbc58.

this requires us to use the new bolt parser at generate-bolt.py
and updates to all of the type specifications (ie. from u8 -> byte)
2019-07-16 06:10:58 +00:00
Rusty Russell c95b4eedf4 gossipd: fail clearly if we can't open/create gossip_store.
Otherwise we fail at the write, and then it's not clear
*why* we couldn't open file:

lightning_gossipd: Writing version to store: Bad file descriptor (version v0.7.1-16-g7ea5c5c)
0x560dcf1a3779 send_backtrace common/daemon.c:40
0x560dcf1a634d status_failed common/status.c:192
0x560dcf19726a gossip_store_new gossipd/gossip_store.c:195
0x560dcf199fd0 new_routing_state gossipd/routing.c:177
0x560dcf1a098b gossip_init gossipd/gossipd.c:2113
0x560dcf1a197a recv_req gossipd/gossipd.c:2946
0x560dcf1a38cd handle_read common/daemon_conn.c:31
0x560dcf1bae2c next_plan ccan/ccan/io/io.c:59
0x560dcf1bb314 do_plan ccan/ccan/io/io.c:407
0x560dcf1bb341 io_ready ccan/ccan/io/io.c:417
0x560dcf1bcb13 io_loop ccan/ccan/io/poll.c:445
0x560dcf1a1ba0 main gossipd/gossipd.c:3073

Reported-by: @JavierRSobrino
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-04 16:10:20 +02:00
lisa neigut 7046d0220c makefiles: move all unit tests under `make check-units`
Isolate unit tests under their own make directive.
2019-06-30 16:41:30 +09:30
Rusty Russell c303d7d534 gossipd: only do (automatic) store compaction at startup.
Rewriting the gossip_store is much more trivial when we don't have
any pointers into it, so add some simple offline compaction code
and disable the automatic compaction code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 20:03:10 -05:00
Rusty Russell c15d9ed37c gossip_store: make copy of corrupt gossip_store on failure.
This should help debugging vastly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell 8928f0b5f9 gossipd: remove gossip entirely if we hit a problem on load.
The crashes in #2750 are mostly caused by us trying to partially truncate
the store.  The simplest fix for release is to discard the whole thing if
we detect a problem.

This is a workaround: it'd be far nicer to try to recover.

Fixes: #2750
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell 8ce3b86aa5 gossipd: tighter correctness checks during gossip_store load.
We shouldn't be loading old timestamps, either.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell fc27250f80 gossipd: be more verbose and less assert()ive on bad node_announcement.
We hit the timestamp assert on #2750; it shouldn't happen, but crashing
doesn't leave much information.

Reported-by: @m-schmook
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell f1b57063f7 bitcoin/tx: use fromwire_fail in pull_bitcoin_tx.
This is the correct way to mark failure: it also sets *max to 0.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 03:56:59 +00:00
Rusty Russell 47b5f2e837 gossipd: truncate gossip_store.tmp for compaction.
If something went wrong and there was an old one, we were
appending to it!

Reported-by: @SimonVrouwe
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-20 02:53:52 +00:00
Rusty Russell 5e3690b3c5 gossipd: delete channel_amount from the store when we delete channel_announcement.
Otherwise we slowly build up cruft: compaction simply moves them since
they're not deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-15 10:52:05 +02:00
Rusty Russell 10c503b4b4 gossip_store: clean up a truncated store.
We might have channel_announcements which have no channel_update: normally
these don't get written into the store until there is one, but if the
store was truncated it can happen.  We then get upset on compaction, since
we don't have an in-memory representation of the channel_announcement.

Similarly, we leave the node_announcement pending until after that
channel_announcement, leading to a similar case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-15 10:52:05 +02:00
Rusty Russell 24cc371cdf gossipd: gossip_store errors after rewrite are fatal.
We can't continue, since we've moved the indexes.  We'll just crash
anyway, as seen from bugs #2742 and #2743.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-14 02:17:32 +00:00
Rusty Russell eb5cc47bdd gossipd: count deleted records correctly when loading gossip_store.
The result of an incorrect count was that we failed on next compaction.

Fixes: #2743
Fixes: #2742
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-14 02:17:32 +00:00
Rusty Russell 745634d9b9 gossipd: don't catch pending node_announcements more than once.
We catch node_announcements for nodes where we haven't finished
analyzing the channel_announcement yet (either because we're still
checking UTXO, or in this case, because we're waiting for a channel_update).

But we reference count the pending_node_announce, so if we have
multiple channels pending, we might try to insert it twice.  Clear it
so this doesn't happen.

There's a second bug where we continue to catch node_announcements
until *all* the channel_announcements are no longer pending; this is fixed
by removing it from the map.

Fixes: #2735
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-13 05:58:09 +00:00
Rusty Russell 1e32b4ab29 gossipd: adjust gossip filters if we discover we're missing gossip.
We pick up to three random peers and ask them to gossip more.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 6830233d0b gossipd: control gossip level so we don't get flooded by peers.
We seek a certain number of peers at each level of gossip; 3 "flood"
if we're missing gossip, 2 at 24 hours past to catch recent gossip, and
8 with current gossip.  The rest are given a filter which causes them
not to gossip to us at all.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell f5ea57d4c0 gossipd: reset gossip_missing if no reports for 10 minutes.
An arbitrary timeout.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell b9053767e7 gossipd: query unknown short_channel_ids, note if they were really missing.
The first sign that we're missing gossip is that we get a channel_update
for an unknown channel.  The peer might be wrong (or lying), but if it turns
out to be a real channel, we were definitely missing something.

This patch does two things: queries when we get an unknown channel_update,
and then notes that a channel_announcement was from such an update when
it's finally processed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 18069ab3da gossipd: APIs return more information about routing message handling.
In particular, we'll need to know the short_channel_id if a
channel_update is unknown (implies we're missing a channel), and whether
processing a pending channel_announcement was successful (implies that
the channel was real).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 5ef7aa70d2 gossipd: prepare for internally-generated short-channel-id queries.
Up until now we only generated these in dev mode for testing.  Hoist
into common code, turn counter into a flag (we're only allowed one!)
and note if query is internal or not.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 21c920a8e8 gossipd: note if loaded store seems reasonably up-to-date.
If not, we can ask peers for full gossip (for now we just set a flag).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 0d2a4830ed ccan: update to faster and correct crc32c implementation.
I decided to try a faster implementation, only to find our crc32c was
not correct!  Ouch.

I removed the crc32c functions from ccan/crc, and added a new crc32c
module which has the Mark Adler x86-64-optimized variants.

We bump gossip_store version again, since csums have changed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:40:10 +00:00
Rusty Russell ab31f40aa2 gossipd: don't charge ourselves fees when calculating route.
This means there's now a semantic difference between the default `fromid`
and setting `fromid` explicitly to our own node_id.  In the default case,
it means we don't charge ourselves fees on the route.

This means we can spend the full channel balance.

We still want to consider the pricing of local channels, however:
there's a *reason* to discount one over another, and that is to bias
things.  So we add the first-hop fee to the *risk* value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:19:11 +00:00
Rusty Russell b48c644e7a listchannels: add `htlc_minimum_msat` and `htlc_maximum_msat` fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:19:11 +00:00
Rusty Russell 1a3886c116 wallet: keep a list of unreleased transactions.
We're going to use this in the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-06 04:47:44 +00:00
Rusty Russell 628b65fb40 gossip_store: don't leave dangling channel_announce if we truncate.
(Or, if we crashed before we got to write out the channel_update).
It's a corner case, but one reported by @darosior and reproduced
on my test node (both with bad gossip_store due to previous iterations
of this patchset!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell f8b98e032c gossipd: Don't abort() on duplicate entries in gossip_store.
Triggered by a previous variant of this PR, but a goo1d idea to simply
discard the store in general when we get a duplicate entry.

We crash trying to delete old ones, which means writing to the store.
But they should have already been deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 34c113a17a gossipd: trivial clean up of routing_add_channel_update.
For some reason I was reluctant to use the hc local variable; I even
re-declared it!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 3e733afb2b gossipd: remove broadcast map altogether.
This clarifies things a fair bit: we simply add and remove from the
gossip_store directly.

Before this series: (--disable-developer, -Og)
    store_load_msec:20669-20902(20822.2+/-82)
    vsz_kb:439704-439712(439706+/-3.2)
    listnodes_sec:0.890000-1.000000(0.92+/-0.04)
    listchannels_sec:11.960000-13.380000(12.576+/-0.49)
    routing_sec:3.070000-5.970000(4.814+/-1.2)
    peer_write_all_sec:28.490000-30.580000(29.532+/-0.78)

After: (--disable-developer, -Og)
    store_load_msec:19722-20124(19921.6+/-1.4e+02)
    vsz_kb:288320
    listnodes_sec:0.860000-0.980000(0.912+/-0.056)
    listchannels_sec:10.790000-12.260000(11.65+/-0.5)
    routing_sec:2.540000-4.950000(4.262+/-0.88)
    peer_write_all_sec:17.570000-19.500000(18.048+/-0.73)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell dd83453b2f gossipd/gossip_store: fix compacting, don't use broadcast ordering.
We have a problem: if we get halfway through writing the compacted store
and run out of disk space, we've already changed half the indexes.

This changes it so we do nothing until writing is finished: then we
iterate through and update indexes.  It also weans us off broadcast
ordering, which we can now eliminated.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 5161b79bfc gossipd/gossip_store: keep count of deleted entries, don't use bs->count.
We didn't count some records before, so we could compare the two counters.
This is much simpler, and avoids reliance on bs.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 728bb4e662 common/gossip_store: handle timestamp filtering.
This means we intercept the peer's gossip_timestamp_filter request
in the per-peer subdaemon itself.  The rest of the semantics are fairly
simple however.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 948490ec58 gossipd: add timestamp in gossip store header.
(We don't increment the gossip_store version, since there are only a
few commits since the last time we did this).

This lets the reader simply filter messages; this is especially nice since
the channel_announcement timestamp is *derived*, not in the actual message.

This also creates a 'struct gossip_hdr' which makes the code a bit
clearer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell bad9734dc7 gossip_store: remove redundant copy_message.
The single caller can easily use transfer_store_msg instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 5591c0b5d8 gossipd: don't send gossip stream, let per-peer daemons read it themselves.
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project.  But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.

This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.

We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 4399faf57c gossipd: make writes to gossip_store atomic.
There's a corner case where otherwise a reader could see the header and
not the body of a message.  It could handle that in various ways,
but simplest (and most efficient) is to avoid it happening.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell a5f6ef385a gossipd: don't wrap messages when we send them to the peer.
They already send *us* gossip messages, so they have to be distinct anyway.
Why make us both do extra work?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell df00f20e4a gossipd: erase old entries from the store, don't just append.
We use the high bit of the length field: this way we can still check
that the checksums are valid on deleted fields.

Once this is done, serially reading the gossip_store file will result
in a complete, ordered, minimal gossip broadcast.  Also, the horrible
corner case where we might try to delete things from the store during
load time is completely gone: we only load non-deleted things.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 696dc6b597 gossipd: disable gossip_store upgrade.
We're about to bump version again, and the code to upgrade it was
quite hairy (and buggy!).  It's not worthwhile for such a
poorly-tested path: I will just add code to limit how much incoming
gossip we get to avoid flooding when we upgrade, however.

I also use a modern gossip_store version in our test_gossip_store_load
test, instead of relying on the upgrade path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 43f2cbd250 gossipd: track gossip_store locations of local channels.
We currently don't care, but the next patch means we have to find them
again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 180a552fba gossip_store: mark private updates separately from normal ones.
They're really gossipd-internal, and we don't want per-peer daemons
to confuse them with normal updates.

I don't bump the gossip_store version; that's coming with another update
anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 763697eb4c gossipd: fix gossip_store calling delete.
Now we handle node_announcements properly, we have a failure case where we
try to move them when a channel is deleted while loading the store.

We're going to remove this soon, in favor of in-place delete, so
workaround this for now to avoid an assert() when we try to write to
the store while loading.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 11:04:25 -07:00
Rusty Russell 21fe518513 gossip_store: fix 'bad node_announcement' by allowing node_announcement on un-updated channel.
When we first receive a channel_update, we write both the
channel_announcement and that channel_update to the store: we need
that first update so we can set the channel_announcement timestamp.

However, the channel_update can be replaced later.  This means we can
have a channel_announcement, a node_update which relies on it, then
the channel_update later.

So move the "this applies to a pending announcement" check lower, where
gossip_store can use it too.  Has a nice side-effect of avoiding
one lookup of the node id.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 11:04:25 -07:00
Rusty Russell c233fc5063 gossipd: fix spurious unused error with gcc-9 -O3.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 00:07:11 +00:00
Rusty Russell c091a4ee40 gossipd: fix spurious gcc warning.
It turns out that we don't look at type when we return 0, but gcc isn't
quite smart enough for that.  Initializing to -1 is good practice anyway
for the failure path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 00:07:11 +00:00
William Casarin 3f035cb3cc gossipd: fix uninitialized free on short_route in goto path
Fix a path where tal_free is called on an uninitialized variable

If the first `goto bad_total` executes, then that path has
uninitialized `short_route` but bad_total passes through to `out`
whose first call is tal_free(short_route).

This was noticed by a maybe-uninitialized heuristic on gcc 7.4.0:

gossipd/routing.c: In function ‘find_shorter_route’:
gossipd/routing.c:1096:2: error: ‘short_route’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
  tal_free(short_route);

Reported-by: @ZmnSCPxj <https://github.com/ElementsProject/lightning/pull/2674#issuecomment-495617253>
Signed-off-by: William Casarin <jb55@jb55.com>
2019-06-03 00:07:11 +00:00
Rusty Russell 654e89b5fc gossipd: free channels in routing_state destructor.
Cleans up the tests.

Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-22 11:28:44 +00:00
Rusty Russell d1f43d993a gossipd: use explicit destructor for struct chan.
Each destructor2 costs 40 bytes, and struct chan is only 120 bytes.  So
this drops our memory usage quite a bit:

MCP bench results change:
   -vsz_kb:580004-580016(580006+/-4.8)
   +vsz_kb:533148

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-22 11:28:44 +00:00
Rusty Russell 59e75f1b2c gossipd: reply to large listchannels in parts.
This has two effects: most importantly, it avoids the problem where
lightningd creates a 800MB JSON blob in response to listchannels,
which causes OOM on the Raspberry Pi (our previous max allocation was
832MB).  This is because lightning-cli can start draining the JSON
while we're filling the buffer, so we end up with a max allocation of
68MB.

But despite being less efficient (multiple queries to gossipd), it
actually speeds things up due to the parallelism:

MCP with -O3 -flto before vs after:
-listchannels_sec:8.980000-9.330000(9.206+/-0.14)
+listchannels_sec:7.500000-7.830000(7.656+/-0.11)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-22 11:28:44 +00:00
Rusty Russell cb9c44ef27 gossipd: remove unnecessary dev_unknown_channel_satoshis arg.
We now have a test blockchain for MCP which has the correct channels,
so this is not needed.

Also fix a benchmark script bug where 'mv "$DIR"/log
"$DIR"/log.old.$$' would fail if you log didn't exist from a previous run.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-22 11:28:44 +00:00
Rusty Russell 85d8848ede gossipd: neaten insert_broadcast a little.
Suggested-by: @cdecker.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-22 11:28:44 +00:00
darosior d9db9dc1ae gossipd: fix listnodes crash on non existing id
'node_arr' was not instanciated if an id was passed to listnodes and we could not get a node from it
2019-05-16 19:30:10 +02:00
Rusty Russell f5a218f9d1 gossipd: send per-peer daemons offsets into gossip store.
Instead of reading the store ourselves, we can just send them an
offset.  This saves gossipd a lot of work, putting it where it belongs
(in the daemon responsible for the specific peer).

MCP bench results:
   store_load_msec:28509-31001(29206.6+/-9.4e+02)
   vsz_kb:580004-580016(580006+/-4.8)
   store_rewrite_sec:11.640000-12.730000(11.908+/-0.41)
   listnodes_sec:1.790000-1.880000(1.83+/-0.032)
   listchannels_sec:21.180000-21.950000(21.476+/-0.27)
   routing_sec:2.210000-11.160000(7.126+/-3.1)
   peer_write_all_sec:36.270000-41.200000(38.168+/-1.9)

Signficant savings in streaming gossip:
   -peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)
   +peer_write_all_sec:35.780000-37.980000(36.43+/-0.81)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell 0e37ac2433 common: move gossip_store read routine where subdaemons can access it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell d8db4e871f gossipd: provide new fd to per-peer daemons when we compact it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell 13717c6ebb gossipd: hand a gossip_store_fd to all subdaemons.
This will let them read from the gossip store directly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell 89291b930e gossipd: pass amount into gossip_store, rather than having it fetch.
We need to store the channel capacity for channel_announcement: hand it
in directly rather than having the gossip_store code do a lookup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell 7ede5aac31 gossip_store: change format so we store raw messages.
Save some overhead, plus gets us ready for giving subdaemons direct
store access.  This is the first time we *upgrade* the gossip_store,
rather than just discarding.

The downside is that we need to add an extra message after each
channel_announcement, containing the channel capacity.

After:
  store_load_msec:28337-30288(28975+/-7.4e+02)
  vsz_kb:582304-582316(582306+/-4.8)
  store_rewrite_sec:11.240000-11.800000(11.55+/-0.21)
  listnodes_sec:1.800000-1.880000(1.84+/-0.028)
  listchannels_sec:22.690000-26.260000(23.878+/-1.3)
  routing_sec:2.280000-9.570000(6.842+/-2.8)
  peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)

Differences:
  -vsz_kb:582320
  +vsz_kb:582316
  -listnodes_sec:2.100000-2.170000(2.118+/-0.026)
  +listnodes_sec:1.800000-1.880000(1.84+/-0.028)
  -peer_write_all_sec:51.600000-52.550000(52.188+/-0.34)
  +peer_write_all_sec:48.160000-51.480000(49.608+/-1.1)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell c7034f271a gossipd: avoid tal overhead in listnodes
We know exactly how many there will be, so allocate an entire array up-front.

  -listnodes_sec:2.540000-2.610000(2.584+/-0.029)
  +listnodes_sec:2.100000-2.170000(2.118+/-0.026)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
trueptolemy fefe7dfbab Gossipd: cleanup extra repeated code 2019-05-06 08:52:36 +00:00
Rusty Russell 0ca0db765a gossipd: fix crash if we truncate store.
Entries we've already loaded expect to exist in the store.  We could go
back and remove them all, but instead just truncate at the known-good
point.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-01 11:59:12 +02:00
Rusty Russell b248bb155a tools/bench-gossipd.sh: make it work (where possible) with DEVELOPER=0
Some tests require dev support, but the rest can run.  We simplify
the gossip_store output so it's the same in non-dev mode too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-24 13:46:39 -05:00
Rusty Russell 0fc42415c2 gossipd/routing: remove BFG implementation.
Now we can benchmark, and remove 500 bytes per node.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35093-37907(36146+/-1.1e+03)
	vsz_kb:555168
	store_rewrite_sec:12.120000-13.750000(12.7+/-0.6)
	listnodes_sec:1.270000-1.370000(1.322+/-0.039)
	listchannels_sec:29.770000-31.600000(30.82+/-0.64)
	routing_sec:0.00
	peer_write_all_sec:63.630000-67.850000(65.432+/-1.7)

MCP notable changes from pre-Dijkstra (>1 stddev):
	-vsz_kb:577456
	+vsz_kb:555168
	-routing_sec:60.70
	+routing_sec:12.04

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell cfdb012b30 gossipd: re-add fuzz logic to routing.
Do it inside the can_reach() function, which is less optimal for BFG
which does 20 ops on the same channel, but fine for Dijkstra.

This does have a measurable cost, so we might want to use
non-cryptographic fuzz in future:

$ gossipd/test/run-bench-find_route 100000 100:

Before:
	100 (100 succeeded) routes in 100000 nodes in 97346 msec (973461784 nanoseconds per route)

After:
	100 (100 succeeded) routes in 100000 nodes in 113381 msec (1133813412 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell e197956032 gossipd/routing: Iterate on Dijkstra when route is too long.
If a route is too long, we try to bias Dijkstra towards choosing a
shorter route by adding a per-hop cost.  We do a naive "shortest path"
pass, then using that cost as a ceiling on per-hop cost, we do a
binary search.

There are some subtleties: we use risk rather than total as our
counter field (we normally bias this by 1 anyway, so it's easy to make
that a variable), and we set riskfactor to a mimimal value once we're
iterating.  It's good enough to get a solution, we don't need to do a
2-dimensional search on riskfactor and riskbias.

Of course, this is extremely slow if we hit it on our benchmark,
though it doesn't happen in a more realistic network:

$ gossipd/test/run-bench-find_route 100000 100:

Before:
	100 (79 succeeded) routes in 100000 nodes in 25341 msec (253412314 nanoseconds per route)

After:
	100 (100 succeeded) routes in 100000 nodes in 97346 msec (973461784 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell f8ffae837d gossipd: speed Dijkstra a little.
Our uintmap can be a little slow with all the reallocation, so leave
NULL entries and walk to find the first one.  Since we don't clean
them up, keep a cache of where the min non-all-NULL value is in the
heap.

It's clearer benefit on really large tests, so here's 1M nodes:

Comparison using gossipd/test/run-bench-find_route 1000000 10:

Before:
	10 (10 succeeded) routes in 1000000 nodes in 91995 msec (9199532898 nanoseconds per route)

After:
	10 (10 succeeded) routes in 1000000 nodes in 20605 msec (2060539287 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell 7caa37f0f1 gossipd: implement Dijkstra.
Use a uintmap as our minheap.

Note that Dijkstra can give overlength routes, so some checks are disabled.

Comparison using gossipd/test/run-bench-find_route 100000 10:

Before:
	10 (10 succeeded) routes in 100000 nodes in 120087 msec (12008708402 nanoseconds per route)
After:
	10 (10 succeeded) routes in 100000 nodes in 2269 msec (226925462 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell 4d84a436f5 gossipd: temporarily disable fuzz in routing.
This allows precise comparison between Dijkstra and Bellman-Ford without
worrying about fuzz.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell 594af8049b gossipd: extract common functionality.
This will be needed by Dijkstra as well.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
Rusty Russell 6dfa46d65a gossipd/test: add test for handling overlong routes.
This is a weakness with Dijkstra, so write an explicit unit test that
we can find a short enough (but more expensive) route.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-18 06:33:09 +00:00
trueptolemy 77236caa91 gossipd: fix the check for node announcement in broadcast_state_check()
There should check if node_id_1 was stored in pubkeys, other than checking scid.
2019-04-16 00:20:26 +00:00
trueptolemy 274f156b28 gossiped: rename empty_node_map() to new_node_map()
empty_node_map() sounds like a destructor. new_node_map() makes sense and is better.
2019-04-14 23:12:00 +00:00
trueptolemy ee036a2e36 Gossipd: change the pending_cannouncement list to htable 2019-04-14 05:39:31 +00:00
Rusty Russell 261921dee2 gossipd: adjust peers' broadcast_offset when compacting store.
When we compact the store, we need to adjust the broadast index for
peers so they know where they're up to.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell fdb42c3170 gossipd: don't keep channel_updates in memory.
This requires some trickiness when we want to re-add unannounced channels
to the store after compaction, so we extract a common "copy_message" to
transfer from old store to new.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:36034-37853(37109.8+/-5.9e+02)
	vsz_kb:577456
	store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
	listnodes_sec:1.250000-1.480000(1.364+/-0.09)
	listchannels_sec:30.820000-31.480000(31.068+/-0.24)
	routing_sec:26.940000-27.990000(27.616+/-0.39)
	peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1202316
	+vsz_kb:577456

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell 0370ed2eca gossipd: use pread in the store.
The next patch causes us to access the store while loading (we read
channel_updates for local peers), which messes up loading due to the
lseek involved.

Using pread() is atomic with seek & read, and also a bit more
efficient.  Make the header contiguous too, while we're here.

We don't need pwrite: we always open with O_APPEND which means the
seek-to-end is implicit.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:36771-38289(37529.6+/-5.3e+02)
	vsz_kb:1202316
	store_rewrite_sec:12.460000-13.280000(12.784+/-0.29)
	listnodes_sec:1.240000-1.410000(1.34+/-0.058)
	listchannels_sec:29.850000-31.840000(30.908+/-0.69)
	routing_sec:27.800000-31.790000(28.822+/-1.5)
	peer_write_all_sec:66.200000-68.720000(67.44+/-0.84)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:39207-45089(41374.6+/-2.2e+03)
	+store_load_msec:36771-38289(37529.6+/-5.3e+02)
	-store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
	+store_rewrite_sec:12.460000-13.280000(12.784+/-0.29)
	-peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)
	+peer_write_all_sec:66.200000-68.720000(67.44+/-0.84)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell 2135c7a024 gossipd: allow reading from the store during load.
When we no longer keep channel_updates in memory, there's a path where
we access them on load: when we promote a local channel to an
announced channel.

This breaks at the moment, since gs->fd == -1; change it to a writable
flag instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell aeb72a05e3 gossipd: remove some fields from struct chan.
The txout_script field is unused; the local_disable only applies to
the handful of local channels, so move that into a hash table.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:39207-45089(41374.6+/-2.2e+03)
	vsz_kb:1202316
	store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
	listnodes_sec:1.290000-3.790000(1.938+/-0.93)
	listchannels_sec:30.190000-32.120000(31.31+/-0.69)
	routing_sec:28.220000-31.340000(29.314+/-1.2)
	peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:35107-37944(36686+/-1e+03)
	+store_load_msec:39207-45089(41374.6+/-2.2e+03)
	-vsz_kb:1218036
	+vsz_kb:1202316
	-listchannels_sec:28.510000-30.270000(29.6+/-0.6)
	+listchannels_sec:30.190000-32.120000(31.31+/-0.69)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell 3280466e19 gossipd: don't keep channel_announcement messages in memory.
MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35107-37944(36686+/-1e+03)
	vsz_kb:1218036
	store_rewrite_sec:14.060000-17.970000(15.966+/-1.6)
	listnodes_sec:1.270000-1.350000(1.314+/-0.034)
	listchannels_sec:28.510000-30.270000(29.6+/-0.6)
	routing_sec:30.230000-31.510000(30.83+/-0.44)
	peer_write_all_sec:67.390000-70.710000(68.568+/-1.2)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1780516
	+vsz_kb:1218036

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell 2fd4a0121f gossipd: unify is_chan_public / is_chan_announced.
We used to have a `struct chan` while we're waiting for an update; now we
keep that internally.  So a `struct chan` without a channel_announcement
in the store is private, and other is public.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00
Rusty Russell aafc489edb gossipd: remove info fields from struct node.
Reload them from disk if they do listnodes.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35390-38659(37336.4+/-1.3e+03)
	vsz_kb:1780516
	store_rewrite_sec:13.800000-16.800000(15.02+/-0.98)
	listnodes_sec:1.280000-1.530000(1.382+/-0.096)
	listchannels_sec:28.700000-30.440000(29.34+/-0.68)
	routing_sec:30.120000-31.080000(30.526+/-0.35)
	peer_write_all_sec:65.910000-76.850000(69.462+/-4.1)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1792996
	+vsz_kb:1780516
	-listnodes_sec:1.030000-1.120000(1.068+/-0.032)
	+listnodes_sec:1.280000-1.530000(1.382+/-0.096)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-04-11 18:31:34 -07:00