aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorbnewbold <bnewbold@robocracy.org>2022-06-27 12:11:34 -0700
committerbnewbold <bnewbold@robocracy.org>2022-06-27 12:11:34 -0700
commit0c4ea3e7bb37b6ff14a2973deceda79b9f255cf5 (patch)
tree7be116ada48ad088026c9a8c2689bd4e46fc1eb3
parentced921dce565fb6289d768101bbb869dbae35b2b (diff)
downloadeinhyrningsins-0c4ea3e7bb37b6ff14a2973deceda79b9f255cf5.tar.gz
einhyrningsins-0c4ea3e7bb37b6ff14a2973deceda79b9f255cf5.zip
adding old notes filesHEADmaster
-rw-r--r--doc/plan.txt113
-rw-r--r--doc/spec.txt7
2 files changed, 120 insertions, 0 deletions
diff --git a/doc/plan.txt b/doc/plan.txt
new file mode 100644
index 0000000..bea2b14
--- /dev/null
+++ b/doc/plan.txt
@@ -0,0 +1,113 @@
+
+rustup run nightly cargo install clippy
+rustup run nightly cargo clippy
+
+startup:
+x bind sockets
+- [optional] bind rpc server+client
+x register signal handlers
+x populate config
+- [optional] spawn rpc thread
+x spawn initial set of children
+x enter main event loop
+ select!
+
+rpc mechanism:
+ use JSON-RPC: https://github.com/ethcore/jsonrpc-core
+ worker side receives client request, creates a new reply channel, sends
+ request+channel to event loop, blocks on reply channel. this all
+ happens in per-client thread?
+ event loop selects() on rpc requests. when one is received, processes,
+ sends reply down channel, closes channel
+
+## Process Lifetime
+
+process table is pid_t -> offspring
+entries are only removed by SIGCHILD handler, which calls pidwait and thus has reap'd
+entries hold a timer guard, so after they are destroyed the timer shouldn't fire
+
+spawn:
+ init
+ checkin childhood (seconds) later
+
+shutdown:
+ notified, send USR2
+ checkin shutdown later: if not dead term it, either way reap
+
+term:
+ notified, send TERM
+ checking shutdown later; if not dead kill it
+
+kill:
+ dead, send KILL
+
+upgrade: <some state machine?>
+ spawn new generation; keep 'replaces' linkage
+ when each new generation is health, notify the 'replace-ee' to shut down
+
+timer: check_alive:
+ healthy; or kill and re-spawn (based on ack mode)
+
+signal: child died:
+ find which child it was (by pid)
+ if was infancy or health and attempts ok, try to respawn, possibly with backoff
+ otherwise, just reap from brood
+
+## Most Basic
+
+x bind to any supplied socket(s)
+x set up environment variables
+x register signal handlers
+x fork() for each child, with some delay in between
+x in each forked copy, execve() to the supplied program
+x in the parent, wait on children, waiting for failures. restart on failure
+- reset signal mask in child processes
+
+then, probably want to shift to an event-driven (single threaded?) setup
+(including signal handling)
+(event-driven seems like it might not work well, so threads+channels instead)
+
+- timers
+- signals: chan_signal
+- child termination
+- RPC commands
+
+threads:
+- shepard: holds sub-process state machines; chan_select!{} on other threads
+- chan_signal (internal to library)
+? timers ?
+- rpc: workers spawned on socket connects
+ -> line-based? JSON-RPC? gRPC?
+
+sub-process states (TODO: look at daemontools):
+- healthy
+- dead
+- starting
+
+## Socket Passing
+
+fcntl F_GETFD: to get fd flags
+FD_CLOEXEC: flag of whether to keep open ("close on exec", true means it isn't passed)
+
+EINHORN_FD_COUNT
+EINHORN_FD_0
+
+-4 and -6 flags (forces IPv4 or IPv6, respectively)
+
+## Command Socket
+
+EINHORN_SOCK_PATH
+
+
+## Child API
+
+listen for USR2 (graceful shutdown)
+parse environment variables
+
+## Rust child impl
+
+https://doc.rust-lang.org/std/os/unix/io/trait.FromRawFd.html
+
+----
+
+Using chan instead of std::sync::mspc because Select/select! is unstable.
diff --git a/doc/spec.txt b/doc/spec.txt
new file mode 100644
index 0000000..e8662fd
--- /dev/null
+++ b/doc/spec.txt
@@ -0,0 +1,7 @@
+
+Signals to daemon itself:
+ USR2, INT -> graceful shutdown all offspring and exit
+ TERM, QUIT -> terminate all offspring and exit (this is a bit faster)
+ HUP -> upgrade all offspring
+
+NB: QUIT and ALRM are different from einhorn