connectd+: Flake/race fix for new channels

1) dualopen has fd to connectd
2) channeld needs to take over
3) dualopen passes fd that leads to a connectd over for channeld to use
4) lightningd must receive the fd transfer request and process
5) dualopen shuts down and closes everything it owns

4 & 5 end up in a race. If 5 happens before 4, channeld ends up with an invalid fd for connectd — leaving it in a position to not receive messages.

Lingering for a second makes 4 win the race. Since the daemon is closing anyway, waiting for a second should be alright.

Changelog-Fixed: Fixed a condition for newly created channels that could trigger a need for reconnect.
This commit is contained in:
Dustin Dettmer 2022-09-16 18:11:51 -04:00 committed by Rusty Russell
parent 05e2317142
commit cc206e1f0e
3 changed files with 13 additions and 0 deletions

View File

@ -955,6 +955,10 @@ static void send_shutdown_complete(struct peer *peer)
wire_sync_write(MASTER_FD,
take(towire_channeld_shutdown_complete(NULL)));
per_peer_state_fdpass_send(MASTER_FD, peer->pps);
/* Give master a chance to pass the fd along */
sleep(1);
close(MASTER_FD);
}

View File

@ -303,6 +303,9 @@ static void dualopen_shutdown(struct state *state)
status_debug("Sent %s with fds",
dualopend_wire_name(fromwire_peektype(msg)));
/* Give master a chance to pass the fd along */
sleep(1);
/* This frees the entire tal tree. */
tal_free(state);
daemon_shutdown();
@ -3989,6 +3992,9 @@ int main(int argc, char *argv[])
dualopend_wire_name(fromwire_peektype(msg)));
tal_free(msg);
/* Give master a chance to pass the fd along */
sleep(1);
/* This frees the entire tal tree. */
tal_free(state);
daemon_shutdown();

View File

@ -1485,6 +1485,9 @@ int main(int argc, char *argv[])
status_debug("Sent %s with fd",
openingd_wire_name(fromwire_peektype(msg)));
/* Give master a chance to pass the fd along */
sleep(1);
/* This frees the entire tal tree. */
tal_free(state);