From fc894b547e6eaeea37420bb902e62695b341cb1c Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Sun, 4 Mar 2018 17:22:45 -0800 Subject: WIP on multiwriter DEP --- proposals/0000-multiwriter.md | 255 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100644 proposals/0000-multiwriter.md diff --git a/proposals/0000-multiwriter.md b/proposals/0000-multiwriter.md new file mode 100644 index 0000000..ec5d2d3 --- /dev/null +++ b/proposals/0000-multiwriter.md @@ -0,0 +1,255 @@ + +Title: **DEP-0000: Multi-Writer** + +Short Name: `0000-multiwriter` + +Type: Standard + +Status: Undefined (as of 2018-03-XX) + +Github PR: (add HTTPS link here after PR is opened) + +Authors: +[Bryan Newbold](https://github.com/bnewbold), +[Stephen Whitmore](https://github.com/noffle), +[Mathias Buus](https://github.com/mafintosh) + + +# Summary +[summary]: #summary + +Multi-Writer is a set of schema, API, and feature extentions to multiple agents +(users, devices, or software) to write to the same HyperDB feed. By building on +top of this abstraction layer, future versions of hyperdrive and Dat will gain +these features. + +Mechanisms for distributed consistency and granting trust are specified here; +the need for merge conflict algorithms and secure key distribution are +mentioned but specific solutions are not specified. + + +# Motivation +[motivation]: #motivation + +The current hypercore/Dat ecosystem currently lacks solutions two fundamental +use cases: + +- individual users should be able to modify distributed archives under their + control from multiple devices, at a minimum to prevent loss of control of + content if a single device (containing secret keys) is lost +- contributions from and collaboration between multiple users on a single + archive or database should be possible, with appropriate trust and access + control semantics + +Access to a single secret key is currently required to make any change to a +hypercore feed, and it is broadly considered best practice not to distribute +secret keys between multiple users or multiple devices. In fact, the current +hypercore implementation has no mechanism to resolve disputes or recover if +multiple agents used the same secret key to append to the same feed. + +Solutions to these two use cases are seen as essential for many current and +future Dat ecosystem applications. + + +# Semantics and Usage +[usage-documentation]: #usage-documentation + +TODO: semantics, terminology + +TODO: things left to application developers (secure key distribution, non-trivial merge resolution) + +TODO: brief note on scaling properties (eg, reasonable numbers of feeds per database) + + +# Implementation +[reference-documentation]: #reference-documentation + +The protobuf schema fields of interest for multi-writer (from the "Node" +message type specified in the HyperDB DEP) are: + +- `seq`: the sequence number of this entry in the owner's hypercore. 0 is the + first, 1 the second, and so forth. +- `feed`: the ID of the hypercore writer that wrote this +- `clock`: vector clock to determine node insertion causality + +## Directed acyclic graph + +The combination of all operations performed on a HyperDB by all of its members +forms a DAG (*directed acyclic graph*). Each write to the database (setting a +key to a value) includes information to point backward at all of the known +"heads" in the graph. + +To illustrate what this means, let's say Alice starts a new HyperDB and writes 2 +values to it: + +``` +// Feed + +0 (/foo/bar = 'baz') +1 (/foo/2 = '{ "some": "json" }') + + +// Graph + +Alice: 0 <--- 1 +``` + +Where sequence number 1 (the second entry) refers to sequence number 0 on the +same feed (Alice's). + +Now Alice *authorizes* Bob to write to the HyperDB. Internally, this means Alice +writes a special message to her feed saying that Bob's feed (identified by his +public key) should be read and replicated in by other participants. Her feed +becomes + +``` +// Feed + +0 (/foo/bar = 'baz') +1 (/foo/2 = '{ "some": "json" }') +2 ('' = '') + + +// Graph + +Alice: 0 <--- 1 <--- 2 +``` + +Authorization is formatted internally in a special way so that it isn't +interpreted as a key/value pair. + +Now Bob writes a value to his feed, and then Alice and Bob sync. The result is: + +``` +// Feed + +//// Alice +0 (/foo/bar = 'baz') +1 (/foo/2 = '{ "some": "json" }') +2 ('' = '') + +//// Bob +0 (/a/b = '12') + + +// Graph + +Alice: 0 <--- 1 <--- 2 +Bob : 0 +``` + +Notice that none of Alice's entries refer to Bob's, and vice versa. This is +because neither has written any entries to their feeds since the two became +aware of each other (authorized & replicated each other's feeds). + +Right now there are two "heads" of the graph: Alice's feed at seq 2, and Bob's +feed at seq 0. + +Next, Alice writes a new value, and her latest entry will refer to Bob's: + +``` +// Feed + +//// Alice +0 (/foo/bar = 'baz') +1 (/foo/2 = '{ "some": "json" }') +2 ('' = '') +3 (/foo/hup = 'beep') + +//// Bob +0 (/a/b = '12') + + +// Graph + +Alice: 0 <--- 1 <--- 2 <--/ 3 +Bob : 0 <-------------------/ +``` + +Because Alice's latest feed entry refers to Bob's latest feed entry, there is +now only one "head" in the database. That means there is enough information in +Alice's seq=3 entry to find any other key in the database. In the last example, +there were two heads (Alice's seq=2 and Bob's seq=0); both of which would need +to be read internally in order to locate any key in the database. + +Now there is only one "head": Alice's feed at seq 3. + + +## Authorization + +The set of hypercores are *authorized* in that the original author of the first +hypercore in a hyperdb must explicitly denote in their append-only log that the +public key of a new hypercore is permitted to edit the database. Any authorized +member may authorize more members. There is no revocation or other author +management elements currently. + + +## Vector clock + +Each node stores a [vector clock](https://en.wikipedia.org/wiki/Vector_clock) of +the last known sequence number from each feed it knows about. This is what forms +the DAG structure. + +A vector clock on a node of, say, `[0, 2, 5]` means: + +- when this node was written, the largest seq # in my local fed is 0 +- when this node was written, the largest seq # in the second feed I have is 2 +- when this node was written, the largest seq # in the third feed I have is 5 + +For example, Bob's vector clock for Alice's seq=3 entry above would be `[0, 3]` +since he knows of her latest entry (seq=3) and his own (seq=0). + +The vector clock is used for correctly traversing history. This is necessary for +the `db#heads` API as well as `db#createHistoryStream`. + + +# Examples +[examples]: #examples + +TODO: + + +# Security and Privacy Concerns +[privacy]: #privacy + +TODO: + + +# Drawbacks +[drawbacks]: #drawbacks + +TODO: Why should we *not* do this? + + +# Rationale and alternatives +[alternatives]: #alternatives + +TODO: + +- Why is this design the best in the space of possible designs? +- What other designs have been considered and what is the rationale for not choosing them? +- What is the impact of not doing this? + + +# Unresolved questions +[unresolved]: #unresolved-questions + +TODO: + +- What parts of the design do you expect to resolve through the DEP consensus process before this gets merged? +- What parts of the design do you expect to resolve through implementation and code review, or are left to independent library or application developers? +- What related issues do you consider out of scope for this DEP that could be addressed in the future independently of the solution that comes out of this DEP? + + +# Changelog +[changelog]: #changelog + +As of March 2018, Mathias Buus (@mafintosh) is leading development of a hyperdb +nodejs module on [github](https://github.com/mafintosh/hyperdb), which includes +multi-writer features and is the basis for this DEP. + +- 2017-12-06: @noffle publishes `ARCHITECTURE.md` overview in the + [hyperdb github repo][arch_md] +- 2018-03-XX: First partial draft submitted for review + +[arch_md]: https://github.com/mafintosh/hyperdb/blob/master/ARCHITECTURE.md -- cgit v1.2.3