Title: **DEP-0000: Multi-Writer** Short Name: `0000-multiwriter` Type: Standard Status: Undefined (as of 2018-03-XX) Github PR: [Draft](https://github.com/datprotocol/DEPs/pull/10) Authors: [Bryan Newbold](https://github.com/bnewbold), [Stephen Whitmore](https://github.com/noffle), [Mathias Buus](https://github.com/mafintosh) # Summary [summary]: #summary Multi-Writer is a set of schema, API, and feature extentions to allow multiple agents (users, devices, or software) to write to the same hyperdb database. By building on top of this abstraction layer, future versions of hyperdrive and Dat will gain these features. Mechanisms for distributed consistency and granting trust are specified here; the need for merge conflict algorithms and secure key distribution are mentioned but specific solutions are not specified. This DEP forms the second half of the hyperdb specification; the first half covered only the key/value database aspects of hyperdb. # Motivation [motivation]: #motivation The current hypercore/Dat ecosystem currently lacks solutions for two fundamental use cases: - individual users should be able to modify distributed archives under their control from multiple devices, at a minimum to prevent loss of control of content if a single device (containing secret keys) is lost - contributions from and collaboration between multiple users on a single archive or database should be possible, with appropriate trust and access control semantics Access to a single secret key is currently required to make any change to a hypercore feed, and it is broadly considered best practice not to distribute secret keys between multiple users or multiple devices. In fact, the current hypercore implementation has no mechanism to resolve disputes or recover if multiple agents used the same secret key to append to the same feed. Solutions to these two use cases are seen as essential for many current and future Dat ecosystem applications. # Concepts, Behavior, and Usage [usage-documentation]: #usage-documentation The multi-writer features of hyperdb are implemented by storing and replicating the contributions of each writer in a separate hypercore feed. This specification concerns itself with the details of how changes from multiple feeds (which may be written and replicated concurrently or asynchronously) are securely combined to present a unified key/value interface. The following related concerns are explicitly left to application developers to design and implement: - secure key distribution and authentication (eg, if a friend should be given write access to a hyperdb database, how is that friend's feed key found and verified?) - merge conflict resolution (using the provided API), potentially using application-layer semantics Before we go any further, a few definitions: *Feed*: A hypercore feed: an append-only log of *Entries*, which can be arbitrary data blobs. *Database*: in this context, a Hyperdb key/value database. Built from several Feeds (two Feeds per Writer). *Writer*: a user (or user controlled device or software agent) that has a distinct feed with a public/private key pair, and thus the ability to append hypercore entries and "write" changes to their version of the database. *Original Writer*: the writer who created a given hyperdb database in the form of the *Original Feed*. The public key of the original feed is the one used to reference the database as a collection of feeds (eg, for the purpose of discovery). At a high level, multi-writer hyperdb works by having existing authorized writers (starting with the original writer) include authorization of new writers by appending metadata to their own feed which points to the new feeds (by public key). Each entry in each writer's feed contains "clock" metadata that records the known state of the entire database (all writers) seen from the perspective of that writer at the time they created the entry, in the form of "clock" version pointers. This metadata (a "[vector clock][vc]") can be used by other writers to resolve (or at least identify) conflicting content in the database. The technical term for this type of system is a "Conflict-free replicated data type" ([CRDT][crdt]). [vc]: https://en.wikipedia.org/wiki/Vector_clock [crdt]: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type ## Core API [api]: #api A "node" is a data structure representing a database entry, including the `key`, `value`, and feed that the entry is commited to. `db.get(key)` (as described in the [hyperdb DEP][dep-hyperdb]) returns an array of nodes. If it is unambiguous what the consistent state of a key is, the array will have only that one value. If there is a conflict (ambiguity), multiple nodes will be returned. If multiple nodes are returned from a `get`, the writer should attempt to merge the values (or chose one over the other) and write the result to the database with `db.put(key, value)`. Client libraries can make this process more ergonomic by accepting a helper function to select between multiple nodes. Libraries can also offer an option to either directly return the value of a single node (instead of the node itself), or raise an error; this is likely to be more ergonomic for applications which do not intend to support multiple writers per database. `db.authorize(key)` will write metadata to the local feed authorizing the new feed (corresponding to `key`) to be included in the database. `db.authorized(key)` (returning a boolean) indicates whether the given `key` is an authorized writer to the hyperdb database. [dep-hyperdb]: https://github.com/datprotocol/DEPs/blob/master/proposals/0004-hyperdb.md ## Scaling [scaling]: #scaling There is some overhead associated with each "writer" added to the feed, impacting the number of files on disk, memory use, and the computational cost of some lookup oprations. The design should easily accomodate dozens of writers, and should scale to 1,000 writers without too much additional overhead. Note that a large number of writers also implies a larger number and rate of append operations, and additional network connections, which may cause scaling issues on their own. More real-world experience and benchmarking is needed in this area. # Implementation Details [reference-documentation]: #reference-documentation The complete protobuf schemas for the hyperdb "Entry" and "InflatedEntry" message types (as specified in the hyperdb DEP) are: ``` message Entry { required string key = 1; optional bytes value = 2; optional bool deleted = 3; required bytes trie = 4; repeated uint64 clock = 5; optional uint64 inflate = 6; } message InflatedEntry { message Feed { required bytes key = 1; } required string key = 1; optional bytes value = 2; optional bool deleted = 3; required bytes trie = 4; repeated uint64 clock = 5; optional uint64 inflate = 6; repeated Feed feeds = 7; optional bytes contentFeed = 8; } ``` The fields of interest for multi-writer are: - `clock`: a "vector clock" to record observed state at the time of writing the entry. Included in every Entry and InflatedEntry. - `inflate`: a back-pointer to the entry index of the most recent InflatedEntry (containing a feed metadata change). Included in every Entry and InflatedEntry. - `feeds`: list of public keys for each writer's feed. Only included in InflatedEntry, and only when feeds have changed. When serialized on disk in a SLEEP directory, the original feed is written under `source/`. If the "local" feed is different from the original feed (aka, local is not the "Original Writer"), it is written to `local/`. All other feeds are written under directories prefixed `./peers//`. TODO: the above disk format is incorrect? ## Directed acyclic graph [dag]: #dag The combination of all operations performed on a hyperdb by all of its members forms a DAG (*directed acyclic graph*). Each write to the database (setting a key to a value) includes information to point backward at all of the known "heads" in the graph. To illustrate what this means, let's say Alice starts a new hyperdb and writes 2 values to it: ``` // Feed 0 (/foo/bar = 'baz') 1 (/foo/2 = '{ "some": "json" }') // Graph Alice: 0 <--- 1 ``` Where sequence number 1 (the second entry) refers to sequence number 0 on the same feed (Alice's). Now Alice *authorizes* Bob to write to the hyperdb. Internally, this means Alice writes a special message to her feed saying that Bob's feed (identified by his public key) should be read and replicated in by other participants. Her feed becomes ``` // Feed 0 (/foo/bar = 'baz') 1 (/foo/2 = '{ "some": "json" }') 2 ('' = '') // Graph Alice: 0 <--- 1 <--- 2 ``` Authorization is formatted internally in a special way so that it isn't interpreted as a key/value pair. Now Bob writes a value to his feed, and then Alice and Bob sync. The result is: ``` // Feed //// Alice 0 (/foo/bar = 'baz') 1 (/foo/2 = '{ "some": "json" }') 2 ('' = '') //// Bob 0 (/a/b = '12') // Graph Alice: 0 <--- 1 <--- 2 Bob : 0 ``` Notice that none of Alice's entries refer to Bob's, and vice versa. This is because neither has written any entries to their feeds since the two became aware of each other (authorized & replicated each other's feeds). Right now there are two "heads" of the graph: Alice's feed at seq 2, and Bob's feed at seq 0. Next, Alice writes a new value, and her latest entry will refer to Bob's: ``` // Feed //// Alice 0 (/foo/bar = 'baz') 1 (/foo/2 = '{ "some": "json" }') 2 ('' = '') 3 (/foo/hup = 'beep') //// Bob 0 (/a/b = '12') // Graph Alice: 0 <--- 1 <--- 2 <--/ 3 Bob : 0 <-------------------/ ``` Because Alice's latest feed entry refers to Bob's latest feed entry, there is now only one "head" in the database. That means there is enough information in Alice's seq=3 entry to find any other key in the database. In the last example, there were two heads (Alice's seq=2 and Bob's seq=0); both of which would need to be read internally in order to locate any key in the database. Now there is only one "head": Alice's feed at seq 3. ## Authorization The set of hypercores are *authorized* in that the original author of the first hypercore in a hyperdb must explicitly denote in their append-only log that the public key of a new hypercore is permitted to edit the database. Any authorized member may authorize more members. There is no revocation or other author management elements currently. ## Vector clock Each node stores a [vector clock](https://en.wikipedia.org/wiki/Vector_clock) of the last known sequence number from each feed it knows about. This is what forms the DAG structure. A vector clock on a node of, say, `[0, 2, 5]` means: - when this node was written, the largest seq # in my local fed is 0 - when this node was written, the largest seq # in the second feed I have is 2 - when this node was written, the largest seq # in the third feed I have is 5 For example, Bob's vector clock for Alice's seq=3 entry above would be `[0, 3]` since he knows of her latest entry (seq=3) and his own (seq=0). Note that the order of clocks is not consistent across writers, only within the same feed. The vector clock is used for correctly traversing history. This is necessary for the `db#heads` API as well as `db#createHistoryStream`. # Examples [examples]: #examples TODO: # Security and Privacy Concerns [privacy]: #privacy TODO: # Drawbacks [drawbacks]: #drawbacks Significant increase in implementation, application, and user experience complexity. Two hard problems (merges and secure key distribution) are left to application developers. # Rationale and alternatives [alternatives]: #alternatives Design goals for hyperdb (including the multi-writer feature) included: - ability to execute operations (get, put) with a sparse (partial) replication of the database, using as few additional network requests as possible - minimal on-disk and on-wire overhead - implemented on top of an append-only log (to build on top of hypercore) TODO: - Why is this design the best in the space of possible designs? - What other designs have been considered and what is the rationale for not choosing them? - What is the impact of not doing this? # Unresolved questions [unresolved]: #unresolved-questions What is the technical term for the specific CRDT we are using? "Operation-based"? If there are conflicts resulting in ambiguity of whether a key has been deleted or has a new value, does `db.get(key)` return an array of `[new_value, None]`? Answer: `get` always returns nodes (not just values), so context is included. In the case of a deletion, a the value within the node will be `null`. What is a reasonable large number of writers to have in a single database? Write "Scaling" section. The javascript hyperdb implementation has a new `deleted` flag and `Header` prefix message. These fall under the scope of the `hyperdb` DEP; should they be mentioned here? # Changelog [changelog]: #changelog As of March 2018, Mathias Buus (@mafintosh) is leading development of a hyperdb nodejs module on [github](https://github.com/mafintosh/hyperdb), which includes multi-writer features and is the basis for this DEP. Jim Pick (@jimpick) has been an active contributor working out multi-writer details. - 2017-12-06: @noffle publishes `ARCHITECTURE.md` overview in the [hyperdb github repo][arch_md] - 2018-06-XX: First partial draft submitted for review [arch_md]: https://github.com/mafintosh/hyperdb/blob/master/ARCHITECTURE.md