From 3b8215a805048819813f39f9181a4553442f37b4 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Sun, 10 Jun 2018 01:18:51 -0700 Subject: update impl reference and example --- proposals/0000-multiwriter.md | 244 +++++++++++++++++++++++++++--------------- 1 file changed, 158 insertions(+), 86 deletions(-) diff --git a/proposals/0000-multiwriter.md b/proposals/0000-multiwriter.md index 07116a2..ed35df6 100644 --- a/proposals/0000-multiwriter.md +++ b/proposals/0000-multiwriter.md @@ -200,136 +200,192 @@ under `source/`. If the "local" feed is different from the original feed (aka, local is not the "Original Writer"), it is written to `local/`. All other feeds are written under directories prefixed `peers//`. +## Feeds and Vector Clocks + +At any point in time, each writer has a potentially unique view of the +"current" state of the database as a whole; this is the nature of real-world +distributed systems. For example, a given write might have the most recent +appends from one peer (eg, only seconds old), but be missing much older appends +from another (eg, days or weeks out of date). By having each writer include +metadata about their percieved state of the system as a whole in operations to +their Feed, all writers are able to collectively converge on an "eventually +consistent" picture of the database as whole (this process will be described in +the next section). + +A writer's "current known state" of the database consists of the set of active +Feeds, and for each the most recent entry sequence number ("version"). This +state can be serialized as an array of integers, refered to as a [vector +clock](https://en.wikipedia.org/wiki/Vector_clock). + +Each `put()` operation on the database appends a node to the writer's `local` +feed, and contains the writer's vector clock as observed at that time. +`InflatedEntry` nodes also contain a list of all known authorized Feeds; +inflated nodes only need to be written when the Feed list changes. Every +non-inflated entry contains a pointer back to the most recent inflated entry; +inflated entries themselves contain a pointer back to the previous inflated +entry (the first inflated entry has a null pointer). Elements of a vector clock +are ordered by the Feed list from the corresponding Inflated entry. + +By convention, the order of Feed lists is to start with the writer's local +feed first, then proceed by the order in which Feeds were discovered. Note that +this ordering is not consistent across writers, only within the same feed. + +As an example, if a node (non-inflated entry) had a vector clock of `[0, 2, +5]`, that would mean: + +- when this node was written, the largest seq # in the writer's local fed was 0 +- when this node was written, the largest seq # in the second known feed was 2 +- when this node was written, the largest seq # in the third known feed was 5 + + +## Multi-Feed Aware hyperdb +[multi-aware]: #multi-aware + +The [hyperdb DEP](hyperdb-dep) specifies the procedures for lookup (`get()`) +and append (`put()`) operations to the database, as well as binary encoding +schemes for entry messages. + +Note that the trie encoding specifies pointers in a `(feed, entry)` pair +format. The `feed` integer is an index into the most recent Feed list (found in +the most recent inflated entry; see the last section). When working with a +multi-writer hyperdb database, simply look up entries in the appropriate feed, +instead of only looking in the current feed. The next section ("Consistent +History") describes which entry (or entries) to start with instead of simply +assuming the most recent entry from the local feed. + + +## Consistent History +[consist-history]: #consist-history + +The set of all appended nodes in all feeds of a hyperdb, and all the vector +clock pointers between them, forms a "directed acyclic graph" (DAG). Any node +which does not have another node pointing to it is called a "head" (this +terminology is similar to that used in git). At any point in time, an observed +copy of a database has one or more heads, each representing the top of a +tree-like graph. In the trivial case of a non-multi-writer hyperdb, there is +always only a single head: the most recent entry in the local feed. Just after +appending to the local feed, there is also always a single head, because that +node's vector clock will reference all know most recent entries from other +feeds. It is only when nodes are appended by separate writers who did not know +of the others' concurrent action (and then these changes are replicated) that +there are multiple heads. + +When operating on the database (eg, executing a `get()` operation), all heads +must be considered. The lookup proceedure documented in the [hyperdb +DEP](hyperdb-dep) must be repeated for each head, and nodes returned +representing the set of all unique values. + +The situation where a `get()` operation multiple heads returns different values +for the same key is called a "conflict" and requires a "merge" to resolve. Some +writer (either a human being or application-layer code) must decide on the +correct value for the key and write that value as a new entry (with a vector +clock that includes the previous heads). The procedure for chosing the best +value to use in a given conflict is sometimes easy to determine, but is +impossible to determine algorithmically in the general case. See the "Usage" +section for more details. -## Directed acyclic graph -[dag]: #dag -The combination of all operations performed on a hyperdb by all of its members -forms a DAG (*directed acyclic graph*). Each write to the database (setting a -key to a value) includes information to point backward at all of the known -"heads" in the graph. +# Examples +[examples]: #examples -To illustrate what this means, let's say Alice starts a new hyperdb and writes 2 -values to it: +Let's say Alice starts a new hyperdb and writes two key/value entries to it: ``` -// Feed - -0 (/foo/bar = 'baz') -1 (/foo/2 = '{ "some": "json" }') +// Operations +Alice: db.put('/foo/bar', 'baz') +Alice: db.put('/foo/2', '{"json":3}') +// Alice's Feed +0 (key='/foo/bar', value='baz', + vector_clock=[], inflated=null, feeds=['a11ce...']) (InflatedEntry) +1 (key='/foo/2', value='{"json":3}', + vector_clock=[0], inflated=0) // Graph - Alice: 0 <--- 1 ``` -Where sequence number 1 (the second entry) refers to sequence number 0 on the -same feed (Alice's). +The vector clock at `seq=1` points back to `seq=0`. -Now Alice *authorizes* Bob to write to the hyperdb. Internally, this means Alice -writes a special message to her feed saying that Bob's feed (identified by his -public key) should be read and replicated in by other participants. Her feed -becomes +Next Alice *authorizes* Bob to write to the database. Internally, this means Alice +writes an Inflated entry to her feed that contains Bob's Feed (identified by his +public key) in her feed list. ``` -// Feed - -0 (/foo/bar = 'baz') -1 (/foo/2 = '{ "some": "json" }') -2 ('' = '') +// Operations +Alice: db.authorize('b0b123...') +// Alice's Feed +0 (key='/foo/bar', value='baz', + vector_clock=[], inflated=null, feeds=['a11ce...']) (InflatedEntry) +1 (key='/foo/2', value='{"json":3}', + vector_clock=[0], inflated=0) +2 (key=null, value=null, + vector_clock=[1], inflated=0, feeds=['a11ce...', 'b0b123...']) (InflatedEntry) // Graph - Alice: 0 <--- 1 <--- 2 ``` -Authorization is formatted internally in a special way so that it isn't -interpreted as a key/value pair. - -Now Bob writes a value to his feed, and then Alice and Bob sync. The result is: +Bob writes a value to his feed, and then Alice and Bob sync. The result is: ``` -// Feed +// Operations +Bob: db.put('/a/b', '12) -//// Alice -0 (/foo/bar = 'baz') -1 (/foo/2 = '{ "some": "json" }') -2 ('' = '') - -//// Bob -0 (/a/b = '12') +// Alice's Feed +0 (key='/foo/bar', value='baz', + vector_clock=[], inflated=null, feeds=['a11ce...']) (InflatedEntry) +1 (key='/foo/2', value='{"json":3}', + vector_clock=[0], inflated=0) +2 (key=null, value=null, + vector_clock=[1], inflated=0, feeds=['a11ce...', 'b0b123...']) (InflatedEntry) +// Bob's Feed +0 (key='/a/b', value='12', + vector_clock=[], inflated=null, feeds=['b0b123...']) (InflatedEntry)) // Graph - Alice: 0 <--- 1 <--- 2 Bob : 0 ``` -Notice that none of Alice's entries refer to Bob's, and vice versa. This is -because neither has written any entries to their feeds since the two became -aware of each other (authorized & replicated each other's feeds). - +Notice that none of Alice's entries refer to Bob's, and vice versa. Neither has +written any entries to their feeds since the two became aware of each other. Right now there are two "heads" of the graph: Alice's feed at seq 2, and Bob's -feed at seq 0. +feed at seq 0. Any `get()` operations would need to descend from both heads, +though in this situation there would be no conflicts as the keys in the two +feeds are disjoint. Next, Alice writes a new value, and her latest entry will refer to Bob's: ``` -// Feed +// Operations +Alice: db.put('/foo/hup', 'beep') -//// Alice -0 (/foo/bar = 'baz') -1 (/foo/2 = '{ "some": "json" }') -2 ('' = '') -3 (/foo/hup = 'beep') +// Alice's Feed +0 (key='/foo/bar', value='baz', + vector_clock=[], inflated=null, feeds=['a11ce...']) (InflatedEntry) +1 (key='/foo/2', value='{"json":3}', + vector_clock=[0], inflated=0) +2 (key=null, value=null, + vector_clock=[1, null], inflated=0, feeds=['a11ce...', 'b0b123...']) (InflatedEntry) +3 (key='/foo/hup', value='beep', + vector_clock=[2,0], inflated=2) -//// Bob -0 (/a/b = '12') +// Bob's Feed +0 (key='/a/b', value='12', + vector_clock=[], inflated=null, feeds=['b0b123...']) (InflatedEntry)) // Graph - Alice: 0 <--- 1 <--- 2 <--/ 3 Bob : 0 <-------------------/ ``` -Because Alice's latest feed entry refers to Bob's latest feed entry, there is -now only one "head" in the database. That means there is enough information in -Alice's seq=3 entry to find any other key in the database. In the last example, -there were two heads (Alice's seq=2 and Bob's seq=0); both of which would need -to be read internally in order to locate any key in the database. - -Now there is only one "head": Alice's feed at seq 3. - - - -## Vector clock - -Each node stores a [vector clock](https://en.wikipedia.org/wiki/Vector_clock) of -the last known sequence number from each feed it knows about. This is what forms -the DAG structure. - -A vector clock on a node of, say, `[0, 2, 5]` means: - -- when this node was written, the largest seq # in my local fed is 0 -- when this node was written, the largest seq # in the second feed I have is 2 -- when this node was written, the largest seq # in the third feed I have is 5 - -For example, Bob's vector clock for Alice's seq=3 entry above would be `[0, 3]` -since he knows of her latest entry (seq=3) and his own (seq=0). Note that the -order of clocks is not consistent across writers, only within the same feed. - -The vector clock is used for correctly traversing history. This is necessary for -the `db#heads` API as well as `db#createHistoryStream`. - - -# Examples -[examples]: #examples - -TODO: +Alice's latest feed entry now points to Bob's latest feed entry, and there is +only one "head" in the database. This means that any `get()` operations only +need to run once, starting at `seq=3` in Alice's feed. # Security and Privacy Concerns @@ -356,6 +412,10 @@ The read semantics of hyperdb are unchanged from hypercore: an actor does not need to be "authorized" (for writing) to read the full history of a database, they only need the public key. +As noted in other DEPs, a malicious writer can potentially execute a denial of +service (DoS) attack by appending hyperdb entries that for a cyclic loop of +references. + # Drawbacks [drawbacks]: #drawbacks @@ -409,6 +469,18 @@ What is the technical term for the specific CRDT we are using? What is the actual on-disk layout (folder structure), if not what is documented here? +Couldn't the local feed's sequence number be skipped from vector clocks, +because it's implied by the sequence number of the hypercore entry itself? Same +with the key in the feed list (for inflated entries). + +If there are multiple heads, but they all return the same `value` for a `get()` +operation, how is it decided which `node` will be returned? AKA, values are the +same, but node metadata might not be (order of vector clock, etc). + +Suspect that some details are off in the example: shouldn't the InflatedEntry +authorizing a new feed include a vector clock reference to a specific seq in +that feed? Should new local (not yet authorized) feeds reference reference +their source in an initial InflatedEntry (at seq=0)? # Changelog [changelog]: #changelog -- cgit v1.2.3