Commit Graph

29 Commits

Author SHA1 Message Date
Rusty Russell a196c77fe6 common: downgrade LND 'internal error' properly.
Thanks to @zerofeerouting for another report.

"desc" here is the sanitized message, eg:
"ERROR error channel 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef: internal error"

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-06-26 13:52:14 +09:30
Rusty Russell e48c0dda85 common: downgrade "internal error" errors from lnd.
Prior to 0.11.0 we had cases where we would treat errors
as warnings: regretfully, this is still needed.  This message
in particular has been widely reported, and it now causes
channel force closes.

Downgrade and log.  I did insert some snarky log message earlier,
but hey, I'm sure CLN has done worse things to our peers!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: treat LND "internal error" as warnings, not force close events (as we did in v0.10).
2022-06-26 13:52:14 +09:30
Rusty Russell 3c5d27e3e9 subdaemons: remove gossipd fd from per-peer daemons.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-02-08 11:15:52 +10:30
Rusty Russell 6115ed02e8 subdaemons: don't stream gossip_store at all.
We now let gossipd do it.

This also means there's nothing left in 'struct per_peer_state' to
send across the wire (the fds are sent separately), so that gets
removed from wire messages too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell ce8b69401c peer_io: replace crypto_sync in daemons, use normal wire messages.
Now connectd is doing the crypto, we can use normal wire io.  We
create helper functions to clearly differentiate between "peer" comms
and intra-daemon comms though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2022-01-20 15:24:06 +10:30
Rusty Russell 4ffda340d3 check: make sure all files outside contrib/ include "config.h" first.
And turn "" includes into full-path (which makes it easier to put
config.h first, and finds some cases check-includes.sh missed
previously).

config.h sets _GNU_SOURCE which really needs to be done before any
'#includes': we mainly got away with it with glibc, but other platforms
like Alpine may have stricter requirements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-12-06 10:05:39 +10:30
Rusty Russell 7401b26824 cleanup: remove unneeded includes in C files.
Before:
 Ten builds, laptop -j5, no ccache:

```
real	0m36.686000-38.956000(38.608+/-0.65)s
user	2m32.864000-42.253000(40.7545+/-2.7)s
sys	0m16.618000-18.316000(17.8531+/-0.48)s
```

 Ten builds, laptop -j5, ccache (warm):

```
real	0m8.212000-8.577000(8.39989+/-0.13)s
user	0m12.731000-13.212000(12.9751+/-0.17)s
sys	0m3.697000-3.902000(3.83722+/-0.064)s
```

After:
 Ten builds, laptop -j5, no ccache: 8% faster

```
real	0m33.802000-35.773000(35.468+/-0.54)s
user	2m19.073000-27.754000(26.2542+/-2.3)s
sys	0m15.784000-17.173000(16.7165+/-0.37)s
```

 Ten builds, laptop -j5, ccache (warm): 1% faster

```
real	0m8.200000-8.485000(8.30138+/-0.097)s
user	0m12.485000-13.100000(12.7344+/-0.19)s
sys	0m3.702000-3.889000(3.78787+/-0.056)s
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2021-09-17 09:43:22 +09:30
Rusty Russell 6b11cc8b8c common: disallow NULL channel_id to peer_failed_err.
No more sending "all-channel" errors; in particular, gossipd now only
sends warnings (which make us hang up), not errors, and peer_connected
rejections are warnings (and disconnect), not errors.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Plugins: `peer_connected` rejections now send a warning, not an error, to the peer.
2021-02-04 12:02:52 +10:30
Rusty Russell f4ee41a989 common: remove peer_failed in favor of peer_failed_warn/peer_failed_err
And make all the callers choose which one.  In general, I prefer warn,
which lets them reconnect and try again, however some places are either
stated that they must be errors in the spec itself, or in openingd
where we abandon the channel when we close the connection anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: Protocol: we now send warning messages and close the connection, except on unrecoverable errors.
2021-02-04 12:02:52 +10:30
Rusty Russell d14e273b04 common: treat all "all-channels" errors as if they were warnings.
This is in line with the warnings draft, where all-zeroes in a
channel_id is no longer special (i.e. it will be ignored).

But gossipd would send these if it got upset with us, so it's best
practice to ignore them for now anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we treat error messages from peer which refer to "all channels" as warnings, not errors.
2021-02-04 12:02:52 +10:30
Rusty Russell a7c5a1f1d2 lightningd: implement receiving warnings.
This takes from the draft spec at https://github.com/lightningnetwork/lightning-rfc/pull/834

Note that if this draft does not get included, the peer will simply
ignore the warning message (we always close the connection afterwards
anyway).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Protocol: we now report the new (draft) warning message.
2021-02-04 12:02:52 +10:30
Rusty Russell 3e52d4100d common: convert to new wire generation style.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2020-08-25 12:53:13 +09:30
Rusty Russell 1d0c433dc4 channeld: treat all incoming errors as "soft", so we retry.
We still close the channel if we *send* an error, but we seem to have hit
another case where LND sends an error which seems transient, so this will
make a best-effort attempt to preserve our channel in that case.

Some test have to be modified, since they don't terminate as they did
previously :(

Changelog-Changed: quirks: We'll now reconnect and retry if we get an error on an established channel. This works around lnd sending error messages that may be non-fatal.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-12-13 16:36:18 +01:00
Rusty Russell 6c7a45623e common: detect "sync error" from lnd.
I'm deeply reluctant to do this, as I'd thought this was fixed with
recent lnd versions.  Logs below show that it continues, with channel
loss on almost every restart.

At this rate, we risk bifurcating the network.  In fact, only four
errors my node have ever been NOT "sync error".

2018-09-12T01:21:40.671Z lightningd(1263): 03e50492eab4107a773141bb419e107bda3de3d55652e6e1a41225f06a0bbf2d56 chan #3: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel b7008735ad2425ab92bcffa2f255ba93f63e0b5c685368f308e76ca0d2a30a41: sync error

2018-12-07T06:41:26.209Z lightningd(1215): 03da1c27ca77872ac5b3e568af30673e599a47a5e4497f85c7b5da42048807b3ed chan #1038: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 48858b0d55ae982596932ceb72584d4bb31363b9ecbaa56721b158ca4d18f5f8: sync error
2018-12-07T06:41:43.707Z lightningd(1215): 0219c2f8818bd2124dcc41827b726fd486c13cdfb6edf4e1458194663fb07891c7 chan #2508: Peer permanent failure in CHANNELD_AWAITING_LOCKIN: lightning_channeld: received ERROR channel 388b653e433773d20d74a151c552df647b74e240ef983d21a6d6c5816523b858: sync error
2018-12-07T06:41:45.553Z lightningd(1215): 03e50492eab4107a773141bb419e107bda3de3d55652e6e1a41225f06a0bbf2d56 chan #1044: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel b58e9391383bfbe848da881ab9ddd9a8987c76318d421dac6f552b0d451ff957: sync error
2018-12-07T06:41:46.501Z lightningd(1215): 0390b5d4492dc2f5318e5233ab2cebf6d48914881a33ef6a9c6bcdbb433ad986d0 chan #871: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 91f43cb6a8c37d0be237a7c491f11d9dfad48534699fb4f076b2c0cbde964424: sync error
2018-12-07T06:41:46.985Z lightningd(1215): 03e5ea100e6b1ef3959f79627cb575606b19071235c48b3e7f9808ebcd6d12e87d chan #1026: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 6cc360db0627b19df146ccd971570c14597b22662bbc0907a233042480e50be7: sync error
2018-12-07T06:41:47.340Z lightningd(1215): 03c2abfa93eacec04721c019644584424aab2ba4dff3ac9bdab4e9c97007491dda chan #1420: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel f363d174390bf819b47e568cb5890c8e432d61c03ba0d38d7c53996679080a74: sync error
2018-12-07T06:41:47.641Z lightningd(1215): 032679fec1213e5b0a23e066c019d7b991b95c6e4d28806b9ebd1362f9e32775cf chan #1058: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 602dc88c7f333ed88f24c6f2c760cb53fa359a4299dfab677f6a81ca33613231: sync error

2019-01-06T10:56:47.332Z lightningd(1202): 02cdf83ef8e45908b1092125d25c68dcec7751ca8d39f557775cd842e5bc127469 chan #2608: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 17b7c895c3feb6ae5b8209ef05044b0aa125629ef1ebc2ce6b2efb27e231533b: sync error
2019-01-06T10:57:08.896Z lightningd(1202): 0219c2f8818bd2124dcc41827b726fd486c13cdfb6edf4e1458194663fb07891c7 chan #2610: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 52d5e3717c7b4f6b06f2b7d55aa8d904a0558706e18be981c82d2c11d4bdf82c: sync error
2019-01-06T10:57:08.950Z lightningd(1202): 02ad6fb8d693dc1e4569bcedefadf5f72a931ae027dc0f0c544b34c1c6f3b9a02b chan #7185: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 245438c15a986b53da7694114c646b77ab663d236d7928732764f5b9251cd2d1: sync error

2019-01-15T09:15:26.882Z lightningd(1191): 03a76b80027d7c067e0da77da95880faaf89e9bf87b73a7d57bd4a3f2a124b764f chan #7430: Peer permanent failure in CHANNELD_AWAITING_LOCKIN: lightning_channeld: received ERROR channel 97c1e01612faf5653af2980abdf382c0f3b24d8a5961b6a3a1eb12444cf9db2e: sync error

2019-05-02T11:32:06.511Z lightningd(14815): 036e8a8efeb26f3cffce99f462839ef6ea3b1691d569d59c402be0d3d6cef9b79c chan #7573: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 6766b0b14013de753f9b354ce7a4b6e4756165ef970aae2650aeda990cfe5687: sync error

2019-06-12T10:38:57.503Z lightningd(1264): 024d2387409269f3b79e2708bb39b895c9f4b6a8322153af54eba487d4993bf60f chan #9607: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 1f3111399c670dab87b4e3d7bac22865c29d4c9992df71fdce9e8893666a08bc: sync error
2019-06-12T10:41:00.435Z lightningd(1264): 02809e936f0e82dfce13bcc47c77112db068f569e1db29e7bf98bcdd68b838ee84 chan #9332: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel a31b5252be9b001f573e00310ea9098532c81322389aa8721946185b1b70ca4c: sync error
2019-06-12T10:46:23.097Z lightningd(1264): 02fcdb04f51d61dddc0481c10751173d523e3408ebe3a848a1d6cb34b1f5df6668 chan #7586: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel bd18e98f5bd56ac73e7b2eb7fd70f6dbe3a4dda1e5bebe7bf6484c3a0f6b55e7: sync error
2019-06-12T10:46:24.627Z lightningd(1264): 03bb88ccc444534da7b5b64b4f7b15e1eccb18e102db0e400d4b9cfe93763aa26d chan #9626: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 345e89c2f0100257940aff7413c1e29786d08b0a1ea1e259d577650d18791872: sync error
2019-06-12T10:46:26.381Z lightningd(1264): 0331f80652fb840239df8dc99205792bba2e559a05469915804c08420230e23c7c chan #9677: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel d38752727ed5dab33abb06c5671e9d7d467feb469f0d249aa488f45e304221c1: sync error
2019-06-12T12:12:51.261Z lightningd(1264): 02d3366059edde4179fc0d071828b4bd726effba7225c3851f3d86a6a827f934a2 chan #9804: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel d00c9eb31bb0c1f5794804114117be3cc75a756a1e4c08099b7188a5fd9f7215: sync error

2019-06-13T03:19:28.212Z lightningd(1218): 03e5ea100e6b1ef3959f79627cb575606b19071235c48b3e7f9808ebcd6d12e87d chan #10792: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 873a526043bbc680ea4398c7a45b9742762d782dea285c661bb90ab8f165976d: sync error
2019-06-13T06:19:52.486Z lightningd(1230): 030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #10743: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 29157b32dd0c13bcf4f785c5527d067159e102d62516e3a00fbf2c0f33bf59ec: sync error

2019-06-14T01:25:37.598Z lightningd(1235): 02cf60741c586aa54ff24381beab1aebf45eda61a8c49b043cf1f6e203e611e581 chan #12786: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 827472a7167ab1fecd680e4f28e1ee74bcd25d04dcdea5d1295ba381b6543661: sync error

2019-07-17T03:37:12.703Z UNUSUAL lightningd(1262): 03021c5f5f57322740e4ee6936452add19dc7ea7ccf90635f95119ab82a62ae268 chan #14764: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 5ff0890d9f1fbb63439a7d793c28cb74c3baef8c9b610c51c64b8a6497237540: sync error
2019-07-17T03:37:14.964Z UNUSUAL lightningd(1262): 030c3f19d742ca294a55c00376b3b355c3c90d61c6b6b39554dbc7ac19b141c14f chan #14839: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 79525ec2c4eaffb5fd6893957f330db81b7383c50d57113d5bf8ffee3c121bdc: sync error
2019-07-17T03:37:16.048Z UNUSUAL lightningd(1262): 028c1da32603fce64118e469ffe2cfeec04d1c4bd88205efb4e8b4208f77a8064e chan #14996: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 6913067c9c89404d9451df25fed1a6cc98b9d9ef801b623d5e8e90aa43ca3077: sync error

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-26 03:53:03 +00:00
Rusty Russell dd79813a75 common: add peer_error flag to treat this error as "soft".
The spec says to close the channel if they send us an error, but we
need to be more lenient to preserve channels with other
implementations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-07-26 03:53:03 +00:00
Rusty Russell 38d2899fbb common/per_per_state: generalize lightningd/peer_comm Part 1
Encapsulating the peer state was a win for lightningd; not surprisingly,
it's even more of a win for the other daemons, especially as we want
to add a little gossip information.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-06-04 01:29:39 +00:00
Rusty Russell 13717c6ebb gossipd: hand a gossip_store_fd to all subdaemons.
This will let them read from the gossip store directly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2019-05-13 05:16:18 +00:00
Rusty Russell b4e6a0fcad peer_failed: write error message to peer directly.
We currently hand the error back to the master, who then stores it for
future connections and hands it back to another openingd to send and exit.

Just send directly; it's more reliable and simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 09cce4a9c7 common/read_peer_msg: deconstruct into individual helper routines.
The One Big API is confusing, and has enough corner cases that we should
ditch it rather than add more.

See: https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction

In particular, when openingd is changed to chat to peers even when
it's not actively opening a channel, it wants to handle (most) errors
by continuing, not calling peer_failed().

This exposes the constituent parts.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-05 02:03:58 +00:00
Rusty Russell ab9d9ef3b8 gossipd: drain fd instead of passing around gossip index.
(This was sitting in my gossip-enchancement patch queue, but it simplifies
this set too, so I moved it here).

In 94711969f we added an explicit gossip_index so when gossipd gets
peers back from other daemons, it knows what gossip it has sent (since
gossipd can send gossip after the other daemon is already complete).

This solution is insufficient for the more general case where gossipd
wants to send other messages reliably, so replace it with the other
solution: have gossipd drain the "gossip fd" which the daemon returns.

This turns out to be quite simple, and is probably how I should have
done it originally :(

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-04-26 05:47:57 +00:00
Rusty Russell 9cffa03647 peer_failed: set permanent slot when we fail the peer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-23 18:02:00 +01:00
Rusty Russell dba08f9d1b peer_failed: don't send error ourselves.
gossipd actually does that now, so we don't need this synchronous send
hack.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell 02d469b3d4 peer_failed: hand fds back to master when we fail.
master now hands it back to gossipd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell f76ff90485 status: split off error messages into a new 'peer_status' type.
Several daemons (onchaind, hsm) want to use the status messages, but
don't communicate with peers.  The coming changes made them drag in
more code they didn't need, so instead we have a different
non-overlapping type.

We combine the status_received_errmsg and status_sent_errmsg
into a single status_peer_error, with the presence or not of the
'error_for_them' field indicating direction. 

We also rename status_fatal_connection_lost() to
peer_failed_connection_lost() to fit in.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell 201d498e39 peer_failed: automatically hand PEER_FD, GOSSIP_FD; add gossip_index
We make it a macro, since everyone uses PEER_FD and GOSSIP_FD constants
(they're actually always the same, but this is slightly safer), and
add a gossip_index arg: this is groundwork for when we want to hand
the peer back to master for gossipd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-19 02:56:51 +00:00
Rusty Russell cc9ca82821 status: separate types for peer failure vs "impossible" failures.
Ideally we'd rename status_failed() to status_fatal(), but that's
too much churn for now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-02-08 19:07:12 +01:00
Rusty Russell e34ec8da2d peer_failed: use towire_errorfmtv() which doesn't add nul terminator.
This code was actually wrong.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-01-12 09:43:01 +01:00
Rusty Russell ef28b6112c status: use common status codes for all the failures.
This change is really to allow us to have a --dev-fail-on-subdaemon-fail option
so we can handle failures from subdaemons generically.

It also neatens handling so we can have an explicit callback for "peer
did something wrong" (which matters if we want to close the channel in
that case).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-09-12 23:00:53 +02:00
Rusty Russell a37c165cb9 common: move some files out of lightningd/
Basically all files shared by different daemons.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2017-08-29 17:54:14 +02:00