doc/plan.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113


rustup run nightly cargo install clippy
rustup run nightly cargo clippy

startup:
x bind sockets
- [optional] bind rpc server+client
x register signal handlers
x populate config
- [optional] spawn rpc thread
x spawn initial set of children
x enter main event loop
    select!

rpc mechanism:
    use JSON-RPC: https://github.com/ethcore/jsonrpc-core
    worker side receives client request, creates a new reply channel, sends
        request+channel to event loop, blocks on reply channel. this all
        happens in per-client thread?
    event loop selects() on rpc requests. when one is received, processes,
        sends reply down channel, closes channel

## Process Lifetime

process table is pid_t -> offspring
entries are only removed by SIGCHILD handler, which calls pidwait and thus has reap'd
entries hold a timer guard, so after they are destroyed the timer shouldn't fire

spawn:
    init
    checkin childhood (seconds) later

shutdown:
    notified, send USR2
    checkin shutdown later: if not dead term it, either way reap

term:
    notified, send TERM
    checking shutdown later; if not dead kill it

kill:
    dead, send KILL

upgrade: <some state machine?>
    spawn new generation; keep 'replaces' linkage
    when each new generation is health, notify the 'replace-ee' to shut down

timer: check_alive:
    healthy; or kill and re-spawn (based on ack mode)

signal: child died:
    find which child it was (by pid)
    if was infancy or health and attempts ok, try to respawn, possibly with backoff
    otherwise, just reap from brood

## Most Basic

x bind to any supplied socket(s)
x set up environment variables
x register signal handlers
x fork() for each child, with some delay in between
x in each forked copy, execve() to the supplied program
x in the parent, wait on children, waiting for failures. restart on failure
- reset signal mask in child processes

then, probably want to shift to an event-driven (single threaded?) setup
(including signal handling)
(event-driven seems like it might not work well, so threads+channels instead)

- timers
- signals: chan_signal
- child termination
- RPC commands

threads:
- shepard: holds sub-process state machines; chan_select!{} on other threads
- chan_signal (internal to library)
? timers ?
- rpc: workers spawned on socket connects
    -> line-based? JSON-RPC? gRPC?

sub-process states (TODO: look at daemontools):
- healthy
- dead
- starting

## Socket Passing

fcntl F_GETFD: to get fd flags
FD_CLOEXEC: flag of whether to keep open ("close on exec", true means it isn't passed)

EINHORN_FD_COUNT
EINHORN_FD_0

-4 and -6 flags (forces IPv4 or IPv6, respectively)

## Command Socket

EINHORN_SOCK_PATH


## Child API

listen for USR2 (graceful shutdown)
parse environment variables

## Rust child impl

https://doc.rust-lang.org/std/os/unix/io/trait.FromRawFd.html

----

Using chan instead of std::sync::mspc because Select/select! is unstable.