Commit Graph

4573 Commits

Author SHA1 Message Date
Rusty Russell f80176eae9 connectd: move list entries of structs to top, to help dev-memleak detection.
We sweep looking for pointers to tal objects; we don't look for pointers
inside them.  Thus lists only work transparently if they're at the head
of the object; so far this has been sufficient.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell f80955c932 broadcast: don't leak in broadcast_del.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 5d1f71c3c0 gossipd: don't leak fields in create_node_announcement.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell a475098928 gossipd: fix leak in gossip_store_add_channel_delete.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 90d1062a55 daemon_conn: fix memory leak when passing an fd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 1c81486b48 routing: fix falsely flagged leak.
pending goes away on a timer, sure, but might as well use tmpctx here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell b10bae1ceb gossipd: use ctx arg in create_channel_update.
Turns out it was always `tmpctx` anyway, so this isn't a real bug right now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 2db77f5d1d gossipd: minor modifications for memleak detection to work.
1. Move the list to the start of `struct peer`: memleak walks the
   list correctly this way.
2. Don't create tal parent loop daemon->conn->daemon.

The second one is silly anyway: we exit via master_gone when the master
conn is closed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell a2828ed40b memleak: allow for scanning non-talloc regions.
For some daemons we'll be handing it non-talloc memory to scan for pointers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 8733015836 memleak: don't require a root pointer.
We can just track everything from NULL (the ultimate parent) down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell aa62d79db2 subd: fix false positive if we're inside a subd_req.
We're going to call out to subds for memleak detection, and the disabler
looks like a memleak if we're inside a callback.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 37ea0d3c7f memleak: fix exclude check.
We want to exclude the child from being entered into the htable:
if we wanted the parent we could do this outside the loop.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 112b7336a3 memleak: create and use a generic htable helper and generic intmap helper.
memleak can't see into htables, as it overloads unused pointer bits.
And it can't see into intmap, since they use malloc (it only looks for tal
pointers).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 32d12e725f hsm: simplify code.
1. The handle pointer is always set to handle_client: just call direclty.
2. Call the root 'client' variable master.
3. We never exit the io_loop: we exit via master_gone instead, so cleanup there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 12a39b8a79 lightningd: fix backtraces in memleak detection.
We were using a *different* backtrace_state var, which was always NULL.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell c02ab124aa ccan: update so we can apply notifiers to toplevel "NULL".
This will make leak detection more thorough.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 19:54:32 +02:00
Rusty Russell 898655f40c chaintopology: fix outdated comment.
Both @cdecker and @SimonVrouwe noted this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell b5ae4c12c7 pytest: fix flaky wait_for_log() in test_funder_feerate_reconnect.
The comment was wrong: the channel being locked in was triggering
the fee update and hence the disconnect.  But that can actually
happen before fund_channel returns, as that waits for the gossipd
to see the channel active.

Best to do the fee update manually, so it's exactly what we want.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell 7925469f88 pytest: fix flaky assert in test_htlc_send_timeout.
If feerates change, L2 sends L3 a commit for that, which causes us to
fail the assert (which says we won't send a commitment_signed).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell 175db926c2 chaintopology: expose when we don't actually know feerate.
We use feerate in several places, and each one really should react
differently when it's not available (such as when bitcoind is still
catching up):

1. For general fee-enforcement, we use the broadest possible limits.
2. For closingd, we use it as our opening negotiation point: just use half
   the last tx feerate.
3. For onchaind, we can use the last tx feerate as a guide for our own txs;
   it might be too high, but at least we know it was sufficient to be mined.
4. For withdraw and fund_channel, we can simply refuse.

Fixes: #1836
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell d93be58bd0 pytest: remove use dev-override-feerates.
Manipulate fees via fake-bitcoin-cli.  It's not quite the same, as
these are pre-smoothing, so we need a restart to override that where
we really need an exact change.  Or we can wait until it reaches a
certain value in cases we don't care about exact amounts.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell a75de62477 chaintopology: always initialize smoothed feerate if it's the first entry.
Not just during startup: we could have bitcoind not give estimates until
later, but we don't want to smooth with zero.

The test changes in next patch trigger this, so I didn't write a test
with this patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell 807e62b05d moveonly: move feerate routines from peer_htlcs.c to channel_control.c
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell 607d4bf9d2 channel: update fees after lockin.
We don't respond to fee changes until we're locked in: make sure we catch
up at that point.

Note that we use NORMAL fees during opening, but IMMEDIATE after, so
this often sends a fee update.  The tests which break, we set those
feerates to be equal.

This (sometimes) changes the behavior of test_permfail, as we now
get an immediate commit, so that is fixed too so we always wait for
that to complete.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell 6338ae8a44 channeld: update fees if we're restarting.
This is a noop if we're opening a new channel (channel_fees_can_change(channel)
is false until funding locked in), but important if we're restarting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell ede05ecd40 pytest: demonstrate feerate on restart bug.
We don't update a channel's feerate on reestablishment: we insert a restart
in test_onchain_different_fees() (which we'll need soon anyway) to show it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell 43fe7f034e chaintopology: try to get a feerate estimate before we complete startup.
It may fail, but it's better than having a window where we're using
the default feerate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-24 02:17:51 +00:00
Rusty Russell b6a63024c1 lightningd: fix double-free on multiple HTLC timeouts.
We can close a connection with a peer to timeout an HTLC, but
we need to clear the pointer otherwise next time we try, we'll
free an expired pointer:

```
lightningd: Fatal signal 6 (version v0.6-336-gfcd1eb5-modded)
0x13ce86 crashdump
	common/daemon.c:37
0x5739f1f ???
	???:0
0x5739e97 ???
	???:0
0x573b800 ???
	???:0
0x1850c3 call_error
	ccan/ccan/tal/tal.c:93
0x18528b check_bounds
	ccan/ccan/tal/tal.c:165
0x1852ca to_tal_hdr
	ccan/ccan/tal/tal.c:174
0x185bfb tal_free
	ccan/ccan/tal/tal.c:472
0x1343a8 peer_sending_commitsig
	lightningd/peer_htlcs.c:1035
0x114f25 channel_msg
	lightningd/channel_control.c:159
0x13756b sd_msg_read
	lightningd/subd.c:474
0x177c1f next_plan
	ccan/ccan/io/io.c:59
0x178717 do_plan
	ccan/ccan/io/io.c:387
0x178755 io_ready
	ccan/ccan/io/io.c:397
0x17a336 io_loop
	ccan/ccan/io/poll.c:310
0x120589 main
	lightningd/lightningd.c:455
0x571cb96 ???
	???:0
0x10e6d9 ???
	???:0
0xffffffffffffffff ???
	???:0
2018-08-16T06:41:21.249Z lightningd(869): FATAL SIGNAL 6 (version v0.6-336-gfcd1eb5-modded)
2018-08-16T06:41:21.250Z lightningd(869): backtrace: common/daemon.c:42 (crashdump) 0x13ceda
2018-08-16T06:41:21.250Z lightningd(869): backtrace: (null):0 ((null)) 0x5739f1f
2018-08-16T06:41:21.250Z lightningd(869): backtrace: (null):0 ((null)) 0x5739e97
2018-08-16T06:41:21.251Z lightningd(869): backtrace: (null):0 ((null)) 0x573b800
2018-08-16T06:41:21.251Z lightningd(869): backtrace: ccan/ccan/tal/tal.c:93 (call_error) 0x1850c3
2018-08-16T06:41:21.251Z lightningd(869): backtrace: ccan/ccan/tal/tal.c:165 (check_bounds) 0x18528b
2018-08-16T06:41:21.252Z lightningd(869): backtrace: ccan/ccan/tal/tal.c:174 (to_tal_hdr) 0x1852ca
2018-08-16T06:41:21.252Z lightningd(869): backtrace: ccan/ccan/tal/tal.c:472 (tal_free) 0x185bfb
2018-08-16T06:41:21.252Z lightningd(869): backtrace: lightningd/peer_htlcs.c:1035 (peer_sending_commitsig) 0x1343a8
2018-08-16T06:41:21.252Z lightningd(869): backtrace: lightningd/channel_control.c:159 (channel_msg) 0x114f25
2018-08-16T06:41:21.253Z lightningd(869): backtrace: lightningd/subd.c:474 (sd_msg_read) 0x13756b
2018-08-16T06:41:21.253Z lightningd(869): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x177c1f
2018-08-16T06:41:21.253Z lightningd(869): backtrace: ccan/ccan/io/io.c:387 (do_plan) 0x178717
2018-08-16T06:41:21.253Z lightningd(869): backtrace: ccan/ccan/io/io.c:397 (io_ready) 0x178755
2018-08-16T06:41:21.253Z lightningd(869): backtrace: ccan/ccan/io/poll.c:310 (io_loop) 0x17a336
2018-08-16T06:41:21.253Z lightningd(869): backtrace: lightningd/lightningd.c:455 (main) 0x120589
2018-08-16T06:41:21.254Z lightningd(869): backtrace: (null):0 ((null)) 0x571cb96
2018-08-16T06:41:21.254Z lightningd(869): backtrace: (null):0 ((null)) 0x10e6d9
2018-08-16T06:41:21.254Z lightningd(869): backtrace: (null):0 ((null)) 0xffffffffffffffff
Log dumped in crash.log
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 16:55:15 +02:00
Rusty Russell 5d1b944a21 hsmd: use async for status reporting.
We can otherwise deadlock against lightningd which also talks to us sync.

Fixes: #1759
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 16:54:21 +02:00
Rusty Russell a217392fca onchaind: remove useless continue.
Reported-by: Christian Decker <@cdecker>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 36b1cac6e6 lightningd: new state AWAITING_UNILATERAL.
When in this state, we send a canned error "Awaiting unilateral close".
We enter this both when we drop to chain, and when we're trying to get
them to drop to chain due to option_data_loss_protect.

As this state (unlike channel errors) is saved to the database, it means
we will *never* talk to a peer again in this state, so they can't
confuse us.

Since we set this state in channel_fail_permanent() (which is the only
place we call drop_to_chain for a unilateral close), we don't need to
save to the db: channel_set_state() does that for us.

This state change has a subtle effect: we return WIRE_UNKNOWN_NEXT_PEER
instead of WIRE_TEMPORARY_CHANNEL_FAILURE as soon as we get a failure
with a peer.  To provoke a temporary failure in test_pay_disconnect we
take the node offline.

Reported-by: Christian Decker @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell a5ecc95c42 db: store claimed per_commitment_point from option_data_loss_protect.
This means we don't try to unilaterally close after a restart, *and*
we can tell onchaind to try to use the point to recover funds when the
peer unilaterally closes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 1a4084442b onchaind: use a point-of-last-resort if we see an unknown transaction.
This may have been supplied by the peer if it's nice and supports
option_data_loss_protect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 6aed936799 channeld: check option_data_loss_protect fields.
Firstly, if they claim to know a future value, we ask the HSM; if
they're right, we tell master what the per-commitment-secret it gave
us (we have no way to validate this, though) and it will not broadcast
a unilateral (knowing it will cause them to use a penalty tx!).

Otherwise, we check the results they sent were valid.  The spec says
to do this (and close the channel if it's wrong!), because otherwise they
could continually lie and give us a bad per-commitment-secret when we
actually need it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell e7116284f0 channeld: move reestablish retransmission below checks.
This makes it a bit clearer, but also means we do all checks before
sending any packets.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 43156643b4 lightningd: message for channeld to tell us that channel risks penalty.
For option_data_loss_protect, the peer can prove to us that it's ahead;
it gives us the (hopefully honest!) per_commitment_point it will use,
and we make sure we don't broadcast the commitment transaction we have.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 18297a173c hsm: routine to validate that a future secret is correct.
Currently it works for any secret (we don't know the current secret),
but importantly it doesn't leak timing information when checking.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 7f37fa3263 derive_basepoints: harden checking.
I managed to crash the HSM by asking for point -1 (shachain_index has an
assert).  Fail in this case, instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 8340d8c070 secret_eq: remove in favor of constant time variant.
To be safe, we should never memcmp secrets.  We don't do this
currently outside tests, but we're about to.

The tests to prove this as constant time are the tricky bit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell da8d620907 pytest: check that we advertise and send option_data_loss_protect.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 4c891f4661 common: log when we toggle IO logging, don't edit env in tests!
Tests were failing when in the same thread after a test which set
log_all_io=True, because SIGUSR1 seemed to be turning logging *off*.

This is due to Python using references not copies for assignment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 9f044305db pytest: dev env var LIGHTNINGD_DEV_LOG_IO turns io logging on immediately.
This is required for the next test, which has to log messages from channeld
as soon as it starts (so might be too late if it sends SIGUSR1).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell ebaf5eaf2e channeld: send option_data_loss_protect fields.
We ignore incoming for now, but this means we advertize the option and
we send the required fields.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 61c6b8b25a tools/generate-wire.py: mark your_last_per_commitment_secret as a struct secret.
Without an override, the tool assumes it's a sha256.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell b123b1867d shachain: shachain_get_secret helper.
This is a wrapper around shachain_get_hash, which converts the
commit_num to an index and returns a 'struct secret' rather than a
'struct sha256' (which is really an internal detail).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 692bae7873 channeld: create get_per_commitment_point helper.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 162adfdf12 listpeers: correctly display features on reconnect.
peer features are only kept for connected peers (as they can change),
but we didn't update them on reconnect.  The main effect was that
after a restart we displayed the features as empty, even after
reconnect.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell f5143d9549 pytest: fail if we see 'bad reestablish' in the logs.
Really, we should use log-level more cleverly here.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 83eadb3548 gossipd: fix SUPERVERBOSE usage, enhance, when turned on.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00
Rusty Russell 28977435a3 channeld: fix incorrect comment on reestablish.
We quote BOLT 2 on *local* above the *remote* checks (we quote it
again below when we do the local checks).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-23 14:46:22 +02:00