update unresolved, security, usage

author: Bryan Newbold <bnewbold@robocracy.org> 2018-06-09 22:56:32 -0700
committer: Bryan Newbold <bnewbold@robocracy.org> 2018-06-09 22:56:32 -0700
commit: 714563f4e49add95f13f42e8dfb4e34c9b964fd0 (patch)
tree: 5baa151b3c4cb69c9e7e58477da22252fc4634d8
parent: 89c41746bc8c3a43651ede387b9ca39fc2ba2a2d (diff)
download: dat-deps-714563f4e49add95f13f42e8dfb4e34c9b964fd0.tar.gz
dat-deps-714563f4e49add95f13f42e8dfb4e34c9b964fd0.zip
1 files changed, 57 insertions, 33 deletions
diff --git a/proposals/0000-multiwriter.md b/proposals/0000-multiwriter.md
index b83606a..07116a2 100644
--- a/proposals/0000-multiwriter.md
+++ b/proposals/0000-multiwriter.md
@@ -113,7 +113,10 @@ A "node" is a data structure representing a database entry, including the
 `db.get(key)` (as described in the [hyperdb DEP][dep-hyperdb])
 returns an array of nodes. If it is unambiguous what the consistent state of a key is,
 the array will have only that one value. If there is a conflict (ambiguity),
-multiple nodes will be returned.
+multiple nodes will be returned. If a key has been deleted from the database, a
+node with the `deleted` flag will be returned; note that such "tombstone" nodes
+can still have a `value` field, which may contain application-specific metadata
+(such as a timestamp).
 
 If multiple nodes are returned from a `get`, the writer should attempt to merge
 the values (or chose one over the other) and write the result to the database
@@ -125,11 +128,14 @@ be more ergonomic for applications which do not intend to support multiple
 writers per database.
 
 `db.authorize(key)` will write metadata to the local feed authorizing the new
-feed (corresponding to `key`) to be included in the database.
+feed (corresponding to `key`) to be included in the database. Once authorized,
+a feed may further authorize additional feeds (recursively).
 
 `db.authorized(key)` (returning a boolean) indicates whether the given `key` is
 an authorized writer to the hyperdb database.
 
+At the time of this DEP there is no mechanism for revoking authorization.
+
 [dep-hyperdb]: https://github.com/datprotocol/DEPs/blob/master/proposals/0004-hyperdb.md
 
 
@@ -192,9 +198,8 @@ The fields of interest for multi-writer are:
 When serialized on disk in a SLEEP directory, the original feed is written
 under `source/`. If the "local" feed is different from the original feed (aka,
 local is not the "Original Writer"), it is written to `local/`. All other feeds
-are written under directories prefixed `./peers/<feed-discovery-key>/`.
+are written under directories prefixed `peers/<feed-discovery-key>/`.
 
-TODO: the above disk format is incorrect?
 
 ## Directed acyclic graph
 [dag]: #dag
@@ -300,14 +305,6 @@ to be read internally in order to locate any key in the database.
 Now there is only one "head": Alice's feed at seq 3.
 
 
-## Authorization
-
-The set of hypercores are *authorized* in that the original author of the first
-hypercore in a hyperdb must explicitly denote in their append-only log that the
-public key of a new hypercore is permitted to edit the database. Any authorized
-member may authorize more members. There is no revocation or other author
-management elements currently.
-
 
 ## Vector clock
 
@@ -338,17 +335,50 @@ TODO:
 # Security and Privacy Concerns
 [privacy]: #privacy
 
-TODO:
+As noted above, there is no existing mechanism for removing authorization for a
+feed once added, and an authorized feed may recursively authorize additional
+feeds. There is also no mechanism to restrict the scope of an authorized feed's
+actions (eg, limit to only a specific path prefix). This leaves application
+designers and users with few tools to control trust or access ("all or
+nothing"). Care must be taken in particular if self-mutating software is being
+distributed via hyperdb, or when action may be taken automatically based on the
+most recent content of a database (eg, bots or even third-party tools may
+publish publicly, or even take real-world action like controlling an electrical
+relay).
+
+There is no mechanism to remove malicious history (or any history for that
+matter); if an authorized (but hostile) writer appends a huge number of key
+operations (bloating hyperdb metadata size), or posts offensive or illegal
+content to a database, there is no way to permanently remove the data without
+creating an new database.
+
+The read semantics of hyperdb are unchanged from hypercore: an actor does not
+need to be "authorized" (for writing) to read the full history of a database,
+they only need the public key.
 
 
 # Drawbacks
 [drawbacks]: #drawbacks
 
-Significant increase in implementation, application, and user experience
-complexity.
-
-Two hard problems (merges and secure key distribution) are left to application
-developers.
+Mutli-writer capability incurs a non-trivial increase in library, application,
+and user experience complexity. For many applications, collaboration is an
+essential feature, and the complexity is easily justified. To minimize
+complexity for applications which do not need multi-writer features,
+implementation authors should consider configuration modes which hide the
+complexity of unused features. For example, by having an option to returning a
+single node for a `get()` (and throw an error if there is a conflict), or a
+flag to throw an error if a database unexpectedly contains more than a single
+feed.
+
+Two tasks (conflict merges and secure key distribution) are left to application
+developers. Both of these are Hard Problems. The current design mitigates the
+former by reducing the number of merge conflicts that need to be handled by an
+application (aka, only the non-trivial ones need to be handled), and
+implementation authors are encouraged to provide an ergonomic API for writing
+conflict resolvers. The second problem (secure key distribution) is out of
+scope for this DEP. It is hoped that at least one pattern or library will
+emerge from the Dat ecosystem such that each application author doesn't need to
+invent a solution from scratch.
 
 
 # Rationale and alternatives
@@ -361,29 +391,23 @@ Design goals for hyperdb (including the multi-writer feature) included:
 - minimal on-disk and on-wire overhead
 - implemented on top of an append-only log (to build on top of hypercore)
 
-TODO:
+If a solution for core use cases like collaboration and multi-device
+synchronization is not provided at a low level (as this DEP provides), each
+application will need to invent a solution at a higher level, incuring
+duplicated effort and a higher risk of bugs.
 
-- Why is this design the best in the space of possible designs?
-- What other designs have been considered and what is the rationale for not choosing them?
-- What is the impact of not doing this?
+As an alternative to CRDTs, Operational Transformation (OT) has a reputation
+for being more difficult to understand and implement.
 
 
 # Unresolved questions
 [unresolved]: #unresolved-questions
 
 What is the technical term for the specific CRDT we are using?
-"Operation-based"?
-
-If there are conflicts resulting in ambiguity of whether a key has been deleted
-or has a new value, does `db.get(key)` return an array of `[new_value, None]`?
-Answer: `get` always returns nodes (not just values), so context is included. In the case of a deletion, a the value within the node will be `null`.
-
-What is a reasonable large number of writers to have in a single database?
-Write "Scaling" section.
+"Operation-based" or "State-based"?
 
-The javascript hyperdb implementation has a new `deleted` flag and `Header`
-prefix message. These fall under the scope of the `hyperdb` DEP; should they be
-mentioned here?
+What is the actual on-disk layout (folder structure), if not what is documented
+here?
 
 
 # Changelog
author	Bryan Newbold <bnewbold@robocracy.org>	2018-06-09 22:56:32 -0700
committer	Bryan Newbold <bnewbold@robocracy.org>	2018-06-09 22:56:32 -0700
commit	714563f4e49add95f13f42e8dfb4e34c9b964fd0 (patch)
tree	5baa151b3c4cb69c9e7e58477da22252fc4634d8
parent	89c41746bc8c3a43651ede387b9ca39fc2ba2a2d (diff)
download	dat-deps-714563f4e49add95f13f42e8dfb4e34c9b964fd0.tar.gz dat-deps-714563f4e49add95f13f42e8dfb4e34c9b964fd0.zip