Commit Graph

39 Commits

Author SHA1 Message Date
Rusty Russell 0ba547ee10 gossipd: handle overflowing query properly (avoid slow 100% CPU reports)
Don't do this:
  (gdb) bt
  #0  0x00007f37ae667c40 in ?? () from /lib/x86_64-linux-gnu/libz.so.1
  #1  0x00007f37ae668b38 in ?? () from /lib/x86_64-linux-gnu/libz.so.1
  #2  0x00007f37ae669907 in deflate () from /lib/x86_64-linux-gnu/libz.so.1
  #3  0x00007f37ae674c65 in compress2 () from /lib/x86_64-linux-gnu/libz.so.1
  #4  0x000000000040cfe3 in zencode_scids (ctx=0xc1f118, scids=0x2599bc49 "\a\325{", len=176320) at gossipd/gossipd.c:218
  #5  0x000000000040d0b3 in encode_short_channel_ids_end (encoded=0x7fff8f98d9f0, max_bytes=65490) at gossipd/gossipd.c:236
  #6  0x000000000040dd28 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290511, number_of_blocks=8) at gossipd/gossipd.c:576
  #7  0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290511, number_of_blocks=16) at gossipd/gossipd.c:595
  #8  0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290495, number_of_blocks=32) at gossipd/gossipd.c:596
  #9  0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290495, number_of_blocks=64) at gossipd/gossipd.c:595
  #10 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=128) at gossipd/gossipd.c:596
  #11 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=256) at gossipd/gossipd.c:595
  #12 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=512) at gossipd/gossipd.c:595
  #13 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17290431, number_of_blocks=1024) at gossipd/gossipd.c:595
  #14 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=2047) at gossipd/gossipd.c:596
  #15 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=4095) at gossipd/gossipd.c:595
  #16 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=8191) at gossipd/gossipd.c:595
  #17 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=16382) at gossipd/gossipd.c:595
  #18 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=32764) at gossipd/gossipd.c:595
  #19 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=65528) at gossipd/gossipd.c:595
  #20 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=131056) at gossipd/gossipd.c:595
  #21 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=262112) at gossipd/gossipd.c:595
  #22 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=524225) at gossipd/gossipd.c:595
  #23 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=1048450) at gossipd/gossipd.c:595
  #24 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=2096900) at gossipd/gossipd.c:595
  #25 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=4193801) at gossipd/gossipd.c:595
  #26 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=8387603) at gossipd/gossipd.c:595
  #27 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=17289408, number_of_blocks=16775207) at gossipd/gossipd.c:595
  #28 0x000000000040ddee in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=33550414) at gossipd/gossipd.c:596
  #29 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=67100829) at gossipd/gossipd.c:595
  #30 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=134201659) at gossipd/gossipd.c:595
  #31 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=268403318) at gossipd/gossipd.c:595
  #32 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=536806636) at gossipd/gossipd.c:595
  #33 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=1073613273) at gossipd/gossipd.c:595
  #34 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=2147226547) at gossipd/gossipd.c:595
  #35 0x000000000040ddc6 in queue_channel_ranges (peer=0x3868fc8, first_blocknum=514201, number_of_blocks=4294453094) at gossipd/gossipd.c:595
  #36 0x000000000040df26 in handle_query_channel_range (peer=0x3868fc8, msg=0x37e0678 "\001\ao\342\214\n\266\361\263r\301\246\242F\256c\367O\223\036\203e\341Z\b\234h\326\031") at gossipd/gossipd.c:625

The cause was that converting a block number to an scid truncates it
at 24 bits.  When we look through the index from (truncated number) to
(real end number) we get every channel, which is too large to encode,
so we iterate again.

This fixes both that problem, and also the issue that we'd end up
dividing into many empty sections until we get to the highest block
number.  Instead, we just tack the empty blocks on to then end of the
final query.

(My initial version requested 0xFFFFFFFE blocks, but the dev code
which records what blocks were returned can't make a bitmap that big
on 32 bit).

Reported-by: George Vaccaro
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 11:34:45 -08:00
Rusty Russell ba41d6e3df pytest: failing test for overflow in query_channel_range
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 11:34:45 -08:00
Rusty Russell 52750f2dcc pytest: tighten the query_channel_range test.
Make the two channels adjacent, and specify exactly the number of
divide-and-conquer steps there are.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 11:34:45 -08:00
Rusty Russell 9f1f79587e short_channel_id_dir: new primitive for one direction of short_channel_id
Currently only used by gossipd for channel elimination.

Also print them in canonical form (/[01]), so tests need to be
changed.

Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 12:01:38 +01:00
Rusty Russell 80753bfbd5 Feedback from @niftynei.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 12:01:38 +01:00
Rusty Russell dc2ee9639b listchannels: allow source arg to list channels by their source node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 12:01:38 +01:00
Rusty Russell 3d016e7249 getroute: allow array of channels to exclude.
The pay plugin will use this, rather than the current "suppress for 90 second" hacks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-01-15 12:01:38 +01:00
Christian Decker 659a26ea5a misc: Update short_channel_id representation to use 'x' separators
Reported-by: Alex Bosworth <@alexbosworth>
Signed-off-by: Christian Decker <decker.christian@gmail.com>
2019-01-15 03:50:27 +00:00
lisa neigut eab992cecd py-tests: rename 'announce' to 'wait_for_announce'
Better description of what the option actually does -- if true
waits for the announcement messages to be generated and exchanged.
2018-12-08 15:15:55 -08:00
Rusty Russell 7d614aaf25 pytest: really remove all bitcoin generate RPC calls.
generate was deprecated some time ago, so we added the generate_block()
helper.  But many calls crept back in, and git master refuses it.

(test_blockchaintrack relied on the return value, so make generate_block
return the list of blocks).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-28 16:03:12 +01:00
Rusty Russell 38e6aa66ff python: quieten modern flake8.
After Ubuntu 18.10 upgrade, lots of new flake8 warnings.

$ flake8 --version:
3.5.0 (mccabe: 0.6.1, pycodestyle: 2.4.0, pyflakes: 1.6.0) CPython 3.6.7rc1 on Linux

Note it seems that W503 warned about line breaks before binary
operators, and W504 complains about them after.  I prefer W504, so
disable W503.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-28 16:03:12 +01:00
Rusty Russell fcb5310873 pytest: make wait_for do exponential backoff, start at 0.25 seconds.
This doesn't alter runtime very much, but does reduce log spam.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-10-10 06:10:42 +00:00
lisa neigut a9bf1f5573 tests: quiet down DeprecationWarnings for escape sequences
Nuke all the `DeprecationWarning: invalid escape sequence
\[` messages that show up when you run python tests.
2018-10-08 13:18:31 +02:00
Rusty Russell bb5e2ffafb gossipd: don't create redundant node_announcements.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 18:20:17 +02:00
Rusty Russell 6c54a22d63 pytest: make test_node_reannounce check for redundant announce (xfail).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 18:20:17 +02:00
Rusty Russell 0baa5f7071 gossipd: send node announcement on startup.
I suspect this fixes #1660 too, but checking would be good.

Fixes: #1781
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 18:20:17 +02:00
Rusty Russell cbd1d1d0f2 pytest: test that we reannounce node after restart.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-28 18:20:17 +02:00
Rusty Russell 16e16a725e gossipd: apply private updates to announce channel.
We trade channel_update before channel_announce makes the channel
public, and currently forget them when we finally get the
channel_announce.  We should instead apply them, and not rely on
retransmission (which we remove in the next patch!).

This earlier channel_update means test_gossip_jsonrpc triggers too
early, so have that wait for node_announcement.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-26 03:21:35 +00:00
Rusty Russell 8455b12781 Revert "gossipd: handle premature node_announcements in the store."
This reverts commit e2f426903d.

With the new store version, this can't happen.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-21 17:56:15 +02:00
Rusty Russell 48de77d56e gossipd: invalidate old gossip_stores.
Incrementing version number means stores which were prior to the previous
commit will be removed, and refreshed.  The simplest fix, if not the most
efficient.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-21 17:56:15 +02:00
Rusty Russell 0855422110 gossip_control: when searching for a txout, make sure it's not spent!
There's no reason for the db to ever return non-NULL if it's spent.  And there's
only one caller, for which that is definitely true.

Suggested-by: @cdecker
Fixes: #1934
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-21 17:56:15 +02:00
Rusty Russell 24c386c086 test_gossip: gossip retransmit on spent UTXO test.
Aka #1934.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-21 17:56:15 +02:00
Rusty Russell 6228c25643 test_gossip: basic 'node notices close' test.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-21 17:56:15 +02:00
Rusty Russell 0925daa087 gossipwith: simple tool to snarf gossip from a node.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-21 17:56:15 +02:00
Rusty Russell e2f426903d gossipd: handle premature node_announcements in the store.
These happen after we compact the store; every log I've seen of a
restart on a real node has a message about truncating the store,
because node_announcements predate channel_announcements.

I extracted one such case from testnet, and reduced it to test here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-09-04 14:36:05 +02:00
Rusty Russell 4f1186c4b1 connectd: iterate through all known addresses for a peer, not just one.
If we have an address hint, we start with that, but we'll use
node_announcement information if required.

Note: we (ab)use the address hint when restoring from the database
or reconnecting, even if the connection was *incoming*.  That meant
that the recipient of a connection would *never* manage to connect out.

We still don't take multiple addresses from the DNS seeds: I assume we
should, since there could be IPv4 and IPv6.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 12:46:45 +02:00
Rusty Russell e59cbb3e2c pytest: make sure receiving peer's openingd is ready.
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.

(Previously the connection stayed in connectd, so there was no such
window).

This is an occasional issue in a few places.

Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 50f5eb34b4 openingd: take peer before we're opening, wait for explicit funding msg.
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.

Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.

This is a deep structural change, so we do the minimum here and cleanup
in the following patches.

Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
   find the race between us asking connectd to release the peer and it
   reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
   channel (until it has succeeded): it's very transitory anyway so not
   worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
   for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
   or a fundee, or an error in response to our request to fund a channel.
   so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
   hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.

Openingd:
1. We always have information we need to accept them funding a channel (in
   the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
   the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.

Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
   openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
   exist; 'state' is now per-channel.  It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
   the peer hand over the connection to openingd.  Our tests sometimes
   checked getpeers() on the peer, and didn't see anything, so line_graph
   needed updating.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 329270525c pytest: only use dev-allow-localhost when needed.
The next patches get better at reconecting, so if we use dev-allow-localhost
nodes can often find each other and reconnect before shutting down; only
use that option where we actually need it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 58d090c3c2 pytest: fix flaky test.
Saw this in Travis: technically we return from the dev_set_max_scids...
cmd after sending it to gossipd, but we should wait for it to log.
Adding an internal reply message for a dev command seems overkill.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 12:39:59 +02:00
Christian Decker ae99e493b8 pytest: Remove test_lightningd and all the legacy testing framework 2018-08-07 00:54:19 +00:00
Christian Decker 605bf8c89d pytest: Migrate the gossip related tests to the new fixture model 2018-08-07 00:54:19 +00:00
Rusty Russell c46f373205 options: refuse two --announce-addr of the same type.
Gossipd will ignore the second one, but doing it in the front end
gives an explicit error message.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-01 15:03:21 +02:00
Rusty Russell f67182ff20 gossipd: order node_announcement addresses correctly, remove duplicate types.
Fixes: #1596
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-07-01 15:03:21 +02:00
Rusty Russell 965c20caae tests: reenable developer tests.
72d103d6bb deprecated DEVELOPER env var
in favor of config.vars, but didn't update test_closing.py or test_gossip.py.

5d0a54b7f0 then removed the explicit
DEVELOPER= setting from Travis.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-06-13 16:22:23 +02:00
Christian Decker c17848a3f3 gossip: Disable local channels after loading the gossip_store
We don't have any connection yet, so how could they be active? Disable both
sides to avoid trying to route through them or telling others to use them as
`contact_points` in invoices.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
2018-05-31 02:30:27 +00:00
Rusty Russell edf1b3cec9 More option cleanups.
Because we have too many which are never used and I don't want to document
them.

1. Remove unused anchor_onchain_wait.  When implemented, it should be
   hardcoded to 100 or more.
2. Remove anchor_confirms_max.  10 always reasonable, and we can readd
   an override option should someone need it.
3. max_htlc_expiry should be the same as locktime_max (which increases
   from 3 to 5 days by default): they're both a limit on how long
   funds can be locked up.
4. channel_update_interval should always be a dev option.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-05-20 02:32:42 +00:00
Rusty Russell fe96fe10c7 Clean up network options.
It's become clear that our network options are insufficient, with the coming
addition of Tor and unix domain support.

Currently:

1. We always bind to local IPv4 and IPv6 sockets, unless --port=0, --offline,
   or any address is specified explicitly.  If they're routable, we announce.
2. --addr is used to announce, but not to control binding.

After this change:

1. --port is deprecated.
2. --addr controls what we bind to and announce.
3. --bind-addr/--announce-addr can be used to control one and not the other.
4. Unless --autolisten=0, we add local IPv4 & IPv6 port 9735 (and announce if they are routable).
5. --offline still overrides listening (though announcing is still the same).

This means we can bind to as many ports/interfaces as we want, and for
special effects we can announce different things (eg. we're sitting
behind a port forward or a proxy).

What remains to implement is semi-automatic binding: we should be able
to say '--addr=0.0.0.0:9999' and have the address resolve at bind
time, or even '--addr=0.0.0.0:0' and have the port autoresolve too
(you could determine what it was from 'lightning-cli getinfo'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-05-07 22:37:28 +02:00
Christian Decker 727d115296 pytest: Add py.test fixtures and migrate first example test
This is the first example of the py.test style fixtures which should allow us to
write much cleaner and nicer tests.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
2018-05-07 02:40:50 +00:00