Commit Graph

654 Commits

Author SHA1 Message Date
lisa neigut 32eaae0cb9 wire-gen: move in-house wire delcarations to new format
tidying things up!
2019-07-24 06:31:46 +00:00
Rusty Russell b3215a866b gossipd: fix inverted test in debug print.
==1503== Use of uninitialised value of size 8
==1503==    at 0x566786B: _itoa_word (_itoa.c:179)
==1503==    by 0x566AF0D: vfprintf (vfprintf.c:1642)
==1503==    by 0x569790F: vsnprintf (vsnprintf.c:114)
==1503==    by 0x156CCB: do_vfmt (str.c:66)
==1503==    by 0x156DB1: tal_vfmt_ (str.c:92)
==1503==    by 0x1289CD: status_vfmt (status.c:141)
==1503==    by 0x128AAC: status_fmt (status.c:151)
==1503==    by 0x118E05: route_prune (routing.c:2495)
==1503==    by 0x11DE2D: gossip_refresh_network (gossipd.c:1997)
==1503==    by 0x1292B8: timer_expired (timeout.c:39)
==1503==    by 0x12088C: main (gossipd.c:3075)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-17 20:16:55 -05:00
lisa neigut 5c07afac7d bolt: update to BOLT spec changes (extract format + type specifications)
updates the bolt version to 6639cef095a2ecc7b8f0c48c6e7f2f906fbfbc58.

this requires us to use the new bolt parser at generate-bolt.py
and updates to all of the type specifications (ie. from u8 -> byte)
2019-07-16 06:10:58 +00:00
Rusty Russell c95b4eedf4 gossipd: fail clearly if we can't open/create gossip_store.
Otherwise we fail at the write, and then it's not clear
*why* we couldn't open file:

lightning_gossipd: Writing version to store: Bad file descriptor (version v0.7.1-16-g7ea5c5c)
0x560dcf1a3779 send_backtrace common/daemon.c:40
0x560dcf1a634d status_failed common/status.c:192
0x560dcf19726a gossip_store_new gossipd/gossip_store.c:195
0x560dcf199fd0 new_routing_state gossipd/routing.c:177
0x560dcf1a098b gossip_init gossipd/gossipd.c:2113
0x560dcf1a197a recv_req gossipd/gossipd.c:2946
0x560dcf1a38cd handle_read common/daemon_conn.c:31
0x560dcf1bae2c next_plan ccan/ccan/io/io.c:59
0x560dcf1bb314 do_plan ccan/ccan/io/io.c:407
0x560dcf1bb341 io_ready ccan/ccan/io/io.c:417
0x560dcf1bcb13 io_loop ccan/ccan/io/poll.c:445
0x560dcf1a1ba0 main gossipd/gossipd.c:3073

Reported-by: @JavierRSobrino
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-04 16:10:20 +02:00
lisa neigut 7046d0220c makefiles: move all unit tests under `make check-units`
Isolate unit tests under their own make directive.
2019-06-30 16:41:30 +09:30
Rusty Russell c303d7d534 gossipd: only do (automatic) store compaction at startup.
Rewriting the gossip_store is much more trivial when we don't have
any pointers into it, so add some simple offline compaction code
and disable the automatic compaction code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 20:03:10 -05:00
Rusty Russell c15d9ed37c gossip_store: make copy of corrupt gossip_store on failure.
This should help debugging vastly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell 8928f0b5f9 gossipd: remove gossip entirely if we hit a problem on load.
The crashes in #2750 are mostly caused by us trying to partially truncate
the store.  The simplest fix for release is to discard the whole thing if
we detect a problem.

This is a workaround: it'd be far nicer to try to recover.

Fixes: #2750
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell 8ce3b86aa5 gossipd: tighter correctness checks during gossip_store load.
We shouldn't be loading old timestamps, either.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell fc27250f80 gossipd: be more verbose and less assert()ive on bad node_announcement.
We hit the timestamp assert on #2750; it shouldn't happen, but crashing
doesn't leave much information.

Reported-by: @m-schmook
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 22:03:35 +00:00
Rusty Russell f1b57063f7 bitcoin/tx: use fromwire_fail in pull_bitcoin_tx.
This is the correct way to mark failure: it also sets *max to 0.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-21 03:56:59 +00:00
Rusty Russell 47b5f2e837 gossipd: truncate gossip_store.tmp for compaction.
If something went wrong and there was an old one, we were
appending to it!

Reported-by: @SimonVrouwe
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-20 02:53:52 +00:00
Rusty Russell 5e3690b3c5 gossipd: delete channel_amount from the store when we delete channel_announcement.
Otherwise we slowly build up cruft: compaction simply moves them since
they're not deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-15 10:52:05 +02:00
Rusty Russell 10c503b4b4 gossip_store: clean up a truncated store.
We might have channel_announcements which have no channel_update: normally
these don't get written into the store until there is one, but if the
store was truncated it can happen.  We then get upset on compaction, since
we don't have an in-memory representation of the channel_announcement.

Similarly, we leave the node_announcement pending until after that
channel_announcement, leading to a similar case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-15 10:52:05 +02:00
Rusty Russell 24cc371cdf gossipd: gossip_store errors after rewrite are fatal.
We can't continue, since we've moved the indexes.  We'll just crash
anyway, as seen from bugs #2742 and #2743.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-14 02:17:32 +00:00
Rusty Russell eb5cc47bdd gossipd: count deleted records correctly when loading gossip_store.
The result of an incorrect count was that we failed on next compaction.

Fixes: #2743
Fixes: #2742
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-14 02:17:32 +00:00
Rusty Russell 745634d9b9 gossipd: don't catch pending node_announcements more than once.
We catch node_announcements for nodes where we haven't finished
analyzing the channel_announcement yet (either because we're still
checking UTXO, or in this case, because we're waiting for a channel_update).

But we reference count the pending_node_announce, so if we have
multiple channels pending, we might try to insert it twice.  Clear it
so this doesn't happen.

There's a second bug where we continue to catch node_announcements
until *all* the channel_announcements are no longer pending; this is fixed
by removing it from the map.

Fixes: #2735
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-13 05:58:09 +00:00
Rusty Russell 1e32b4ab29 gossipd: adjust gossip filters if we discover we're missing gossip.
We pick up to three random peers and ask them to gossip more.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 6830233d0b gossipd: control gossip level so we don't get flooded by peers.
We seek a certain number of peers at each level of gossip; 3 "flood"
if we're missing gossip, 2 at 24 hours past to catch recent gossip, and
8 with current gossip.  The rest are given a filter which causes them
not to gossip to us at all.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell f5ea57d4c0 gossipd: reset gossip_missing if no reports for 10 minutes.
An arbitrary timeout.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell b9053767e7 gossipd: query unknown short_channel_ids, note if they were really missing.
The first sign that we're missing gossip is that we get a channel_update
for an unknown channel.  The peer might be wrong (or lying), but if it turns
out to be a real channel, we were definitely missing something.

This patch does two things: queries when we get an unknown channel_update,
and then notes that a channel_announcement was from such an update when
it's finally processed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 18069ab3da gossipd: APIs return more information about routing message handling.
In particular, we'll need to know the short_channel_id if a
channel_update is unknown (implies we're missing a channel), and whether
processing a pending channel_announcement was successful (implies that
the channel was real).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 5ef7aa70d2 gossipd: prepare for internally-generated short-channel-id queries.
Up until now we only generated these in dev mode for testing.  Hoist
into common code, turn counter into a flag (we're only allowed one!)
and note if query is internal or not.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 21c920a8e8 gossipd: note if loaded store seems reasonably up-to-date.
If not, we can ask peers for full gossip (for now we just set a flag).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-12 00:37:46 +00:00
Rusty Russell 0d2a4830ed ccan: update to faster and correct crc32c implementation.
I decided to try a faster implementation, only to find our crc32c was
not correct!  Ouch.

I removed the crc32c functions from ccan/crc, and added a new crc32c
module which has the Mark Adler x86-64-optimized variants.

We bump gossip_store version again, since csums have changed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:40:10 +00:00
Rusty Russell ab31f40aa2 gossipd: don't charge ourselves fees when calculating route.
This means there's now a semantic difference between the default `fromid`
and setting `fromid` explicitly to our own node_id.  In the default case,
it means we don't charge ourselves fees on the route.

This means we can spend the full channel balance.

We still want to consider the pricing of local channels, however:
there's a *reason* to discount one over another, and that is to bias
things.  So we add the first-hop fee to the *risk* value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:19:11 +00:00
Rusty Russell b48c644e7a listchannels: add `htlc_minimum_msat` and `htlc_maximum_msat` fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-11 23:19:11 +00:00
Rusty Russell 1a3886c116 wallet: keep a list of unreleased transactions.
We're going to use this in the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-06 04:47:44 +00:00
Rusty Russell 628b65fb40 gossip_store: don't leave dangling channel_announce if we truncate.
(Or, if we crashed before we got to write out the channel_update).
It's a corner case, but one reported by @darosior and reproduced
on my test node (both with bad gossip_store due to previous iterations
of this patchset!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell f8b98e032c gossipd: Don't abort() on duplicate entries in gossip_store.
Triggered by a previous variant of this PR, but a goo1d idea to simply
discard the store in general when we get a duplicate entry.

We crash trying to delete old ones, which means writing to the store.
But they should have already been deleted.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 34c113a17a gossipd: trivial clean up of routing_add_channel_update.
For some reason I was reluctant to use the hc local variable; I even
re-declared it!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 3e733afb2b gossipd: remove broadcast map altogether.
This clarifies things a fair bit: we simply add and remove from the
gossip_store directly.

Before this series: (--disable-developer, -Og)
    store_load_msec:20669-20902(20822.2+/-82)
    vsz_kb:439704-439712(439706+/-3.2)
    listnodes_sec:0.890000-1.000000(0.92+/-0.04)
    listchannels_sec:11.960000-13.380000(12.576+/-0.49)
    routing_sec:3.070000-5.970000(4.814+/-1.2)
    peer_write_all_sec:28.490000-30.580000(29.532+/-0.78)

After: (--disable-developer, -Og)
    store_load_msec:19722-20124(19921.6+/-1.4e+02)
    vsz_kb:288320
    listnodes_sec:0.860000-0.980000(0.912+/-0.056)
    listchannels_sec:10.790000-12.260000(11.65+/-0.5)
    routing_sec:2.540000-4.950000(4.262+/-0.88)
    peer_write_all_sec:17.570000-19.500000(18.048+/-0.73)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell dd83453b2f gossipd/gossip_store: fix compacting, don't use broadcast ordering.
We have a problem: if we get halfway through writing the compacted store
and run out of disk space, we've already changed half the indexes.

This changes it so we do nothing until writing is finished: then we
iterate through and update indexes.  It also weans us off broadcast
ordering, which we can now eliminated.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 5161b79bfc gossipd/gossip_store: keep count of deleted entries, don't use bs->count.
We didn't count some records before, so we could compare the two counters.
This is much simpler, and avoids reliance on bs.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 728bb4e662 common/gossip_store: handle timestamp filtering.
This means we intercept the peer's gossip_timestamp_filter request
in the per-peer subdaemon itself.  The rest of the semantics are fairly
simple however.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 948490ec58 gossipd: add timestamp in gossip store header.
(We don't increment the gossip_store version, since there are only a
few commits since the last time we did this).

This lets the reader simply filter messages; this is especially nice since
the channel_announcement timestamp is *derived*, not in the actual message.

This also creates a 'struct gossip_hdr' which makes the code a bit
clearer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell bad9734dc7 gossip_store: remove redundant copy_message.
The single caller can easily use transfer_store_msg instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 5591c0b5d8 gossipd: don't send gossip stream, let per-peer daemons read it themselves.
Keeping the uintmap ordering all the broadcastable messages is expensive:
130MB for the million-channels project.  But now we delete obsolete entries
from the store, we can have the per-peer daemons simply read that sequentially
and stream the gossip itself.

This is the most primitive version, where all gossip is streamed;
successive patches will bring back proper handling of timestamp filtering
and initial_routing_sync.

We add a gossip_state field to track what's happening with our gossip
streaming: it's initialized in gossipd, and currently always set, but
once we handle timestamps the per-peer daemon may do it when the first
filter is sent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 4399faf57c gossipd: make writes to gossip_store atomic.
There's a corner case where otherwise a reader could see the header and
not the body of a message.  It could handle that in various ways,
but simplest (and most efficient) is to avoid it happening.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell a5f6ef385a gossipd: don't wrap messages when we send them to the peer.
They already send *us* gossip messages, so they have to be distinct anyway.
Why make us both do extra work?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell df00f20e4a gossipd: erase old entries from the store, don't just append.
We use the high bit of the length field: this way we can still check
that the checksums are valid on deleted fields.

Once this is done, serially reading the gossip_store file will result
in a complete, ordered, minimal gossip broadcast.  Also, the horrible
corner case where we might try to delete things from the store during
load time is completely gone: we only load non-deleted things.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 696dc6b597 gossipd: disable gossip_store upgrade.
We're about to bump version again, and the code to upgrade it was
quite hairy (and buggy!).  It's not worthwhile for such a
poorly-tested path: I will just add code to limit how much incoming
gossip we get to avoid flooding when we upgrade, however.

I also use a modern gossip_store version in our test_gossip_store_load
test, instead of relying on the upgrade path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 43f2cbd250 gossipd: track gossip_store locations of local channels.
We currently don't care, but the next patch means we have to find them
again.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 180a552fba gossip_store: mark private updates separately from normal ones.
They're really gossipd-internal, and we don't want per-peer daemons
to confuse them with normal updates.

I don't bump the gossip_store version; that's coming with another update
anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 763697eb4c gossipd: fix gossip_store calling delete.
Now we handle node_announcements properly, we have a failure case where we
try to move them when a channel is deleted while loading the store.

We're going to remove this soon, in favor of in-place delete, so
workaround this for now to avoid an assert() when we try to write to
the store while loading.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 11:04:25 -07:00
Rusty Russell 21fe518513 gossip_store: fix 'bad node_announcement' by allowing node_announcement on un-updated channel.
When we first receive a channel_update, we write both the
channel_announcement and that channel_update to the store: we need
that first update so we can set the channel_announcement timestamp.

However, the channel_update can be replaced later.  This means we can
have a channel_announcement, a node_update which relies on it, then
the channel_update later.

So move the "this applies to a pending announcement" check lower, where
gossip_store can use it too.  Has a nice side-effect of avoiding
one lookup of the node id.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 11:04:25 -07:00
Rusty Russell c233fc5063 gossipd: fix spurious unused error with gcc-9 -O3.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 00:07:11 +00:00
Rusty Russell c091a4ee40 gossipd: fix spurious gcc warning.
It turns out that we don't look at type when we return 0, but gcc isn't
quite smart enough for that.  Initializing to -1 is good practice anyway
for the failure path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-03 00:07:11 +00:00
William Casarin 3f035cb3cc gossipd: fix uninitialized free on short_route in goto path
Fix a path where tal_free is called on an uninitialized variable

If the first `goto bad_total` executes, then that path has
uninitialized `short_route` but bad_total passes through to `out`
whose first call is tal_free(short_route).

This was noticed by a maybe-uninitialized heuristic on gcc 7.4.0:

gossipd/routing.c: In function ‘find_shorter_route’:
gossipd/routing.c:1096:2: error: ‘short_route’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
  tal_free(short_route);

Reported-by: @ZmnSCPxj <https://github.com/ElementsProject/lightning/pull/2674#issuecomment-495617253>
Signed-off-by: William Casarin <jb55@jb55.com>
2019-06-03 00:07:11 +00:00
Rusty Russell 654e89b5fc gossipd: free channels in routing_state destructor.
Cleans up the tests.

Suggested-by: @ZmnSCPxj
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-22 11:28:44 +00:00