Commit Graph

9 Commits

Author SHA1 Message Date
Rusty Russell 35d7449259 connectd: initialize peer->conn.
It's only used in one place, but that's enough.

Fixes: #1434
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell f8aed1b4b0 pytest: add reconnection stress test.
It sometimes triggers a crash like #1434 (though never under valgrind).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell fefb7faba7 pytest: try a simple reconnection test.
This passes, but that's OK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-10 16:15:12 +02:00
Rusty Russell 8939a5001b connectd: rely on the master to tell us to reconnect.
connectd tells master about every disconnection, and master knows
whether it's important to reconnect.  Just get the master to invoke a new
connect command if it considers the peer important!

The only twist is timeouts: we don't want to immediately reconnect if
we've failed to connect.  To solve this, connectd passes a 'delaytime'
to the master when a connection fails, and the master passes it back
when it asks for a connection.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 035362e151 openingd: don't exit when we receive an error.
In particular, all opening_read_peer_msg() callers need to know there
was an error (presumably, negotiating) so they can stop, but we should
not exit.

This lets us reenable the final disabled test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 02966a4857 connectd: remove unused handback APIs and code.
We now simply maintain a pubkey set for connected peers (we only care
if there's a reconnect), not the entire peer structure.

lightningd no longer queries us for getpeers: it knows more than we do
already.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell e59cbb3e2c pytest: make sure receiving peer's openingd is ready.
There's now a potential race: the source peer connect returns, but in
destination peer the master hasn't read the connect message from
connectd, so the peer isn't in listpeers yet.

(Previously the connection stayed in connectd, so there was no such
window).

This is an occasional issue in a few places.

Note that we take the opportunity to speed up test_disconnectpeer too
while we're there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Rusty Russell 50f5eb34b4 openingd: take peer before we're opening, wait for explicit funding msg.
Prior to this, lightningd would hand uninteresting peers back to connectd,
which would then return it to lightningd if it sent a non-gossip msg,
or if lightningd asked it to release the peer.

Now connectd hands the peer to lightningd once we've done the init
handshake, which hands it off to openingd.

This is a deep structural change, so we do the minimum here and cleanup
in the following patches.

Lightningd:
1. Remove peer_nongossip handling from connect_control and peer_control.
2. Remove list of outstanding fundchannel command; it was only needed to
   find the race between us asking connectd to release the peer and it
   reconnecting.
3. We can no longer tell if the remote end has started trying to fund a
   channel (until it has succeeded): it's very transitory anyway so not
   worth fixing.
4. We now always have a struct peer, and allocate an uncommitted_channel
   for it, though it may never be used if neither end funds a channel.
5. We start funding on messages for openingd: we can get a funder_reply
   or a fundee, or an error in response to our request to fund a channel.
   so we handle all of them.
6. A new peer_start_openingd() is called after connectd hands us a peer.
7. json_fund_channel just looks through local peers; there are none
   hidden in connectd any more.
8. We sometimes start a new openingd just to send an error message.

Openingd:
1. We always have information we need to accept them funding a channel (in
   the init message).
2. We have to listen for three fds: peer, gossip and master, so we opencode
   the poll.
3. We have an explicit message to start trying to fund a channel.
4. We can be told to send a message in our init message.

Testing:
1. We don't handle some things gracefully yet, so two tests are disabled.
2. 'hand_back_peer .*: now local again' from connectd is no longer a message,
   openingd says 'Handed peer, entering loop' once its managing it.
3. peer['state'] used to be set to 'GOSSIPING' (otherwise this field doesn't
   exist; 'state' is now per-channel.  It doesn't exist at all now.
4. Some tests now need to turn on IO logging in openingd, not connectd.
5. There's a gap between connecting on one node and having connectd on
   the peer hand over the connection to openingd.  Our tests sometimes
   checked getpeers() on the peer, and didn't see anything, so line_graph
   needed updating.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2018-08-09 19:44:27 +02:00
Christian Decker 58709cf190 pytest: Migrate connection tests to new fixture model 2018-08-07 00:54:19 +00:00