aboutsummaryrefslogtreecommitdiffstats
path: root/proposals/0003-hyperdb.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-05-06 20:12:05 -0700
committerBryan Newbold <bnewbold@robocracy.org>2018-05-06 20:12:40 -0700
commit1e5933b2f940f66962de576ef4f06ed9f4f30507 (patch)
tree67223af9b44835628f7d8dd028606763b52b8a7e /proposals/0003-hyperdb.md
parent23fa355409a9d13da30e33886ad3de9ff2f093ef (diff)
downloaddat-deps-1e5933b2f940f66962de576ef4f06ed9f4f30507.tar.gz
dat-deps-1e5933b2f940f66962de576ef4f06ed9f4f30507.zip
fix numbering race condition
... quick, before anybody notices! :) DEP-0003 (HTTP Pinning Service API) was merged weeks ago, but I hadn't merged to my local repo.
Diffstat (limited to 'proposals/0003-hyperdb.md')
-rw-r--r--proposals/0003-hyperdb.md760
1 files changed, 0 insertions, 760 deletions
diff --git a/proposals/0003-hyperdb.md b/proposals/0003-hyperdb.md
deleted file mode 100644
index 1158e4e..0000000
--- a/proposals/0003-hyperdb.md
+++ /dev/null
@@ -1,760 +0,0 @@
-
-Title: **DEP-0003: Hyperdb**
-
-Short Name: `0003-hyperdb`
-
-Type: Standard
-
-Status: Draft (as of 2018-05-06)
-
-Github PR: [Draft](https://github.com/datprotocol/DEPs/pull/3)
-
-Authors:
-[Bryan Newbold](https://github.com/bnewbold),
-[Stephen Whitmore](https://github.com/noffle),
-[Mathias Buus](https://github.com/mafintosh)
-
-
-# Summary
-[summary]: #summary
-
-Hyperdb is an abstraction layer providing a general purpose distributed
-key/value store over the hypercore protocol. It is an iteration on the
-hyperdrive directory tree implementation, building on top of the hypercore
-append-only log abstraction layer. Keys are path-like strings (e.g.,
-`/food/fruit/kiwi`), and values are arbitrary binary blobs (generally under a
-megabyte).
-
-Hyperdrive (used by the Dat application) is expected to be re-implemented on
-top of hyperdb for improved performance with many files (e.g., millions). The
-hyperdrive API should be largely unchanged, but the `metadata` format will be
-backwards-incompatible.
-
-
-# Motivation
-[motivation]: #motivation
-
-Hyperdb is expected to drastically improve performance of dat clients when
-working with archives containing tens of thousands of files in single
-directories. This is a real-world bottleneck for several current users, with
-basic local actions such as adding a directory taking an unacceptably long time
-to complete.
-
-A secondary benefit is to refactor the [trie][trie]-structured key/value API
-out of hyperdrive, allowing third party code to build applications directly on
-this abstraction layer.
-
-[trie]: https://en.wikipedia.org/wiki/Trie
-
-
-# Usage Documentation
-[usage-documentation]: #usage-documentation
-
-*This section describes Hyperdb's interface and behavior in the abstract for
-application programmers. It is not intended to be exact documentation of any
-particular implementation (including the reference Javascript module).*
-
-Hyperdb is structured to be used much like a traditional hierarchical
-filesystem. A value can be written and read at locations like `/foo/bar/baz`,
-and the API supports querying or tracking values at subpaths, like how watching
-for changes on `/foo/bar` will report both changes to `/foo/bar/baz` and also
-`/foo/bar/19`.
-
-Lower-level details of the hypercore append-only log, disk serialization, and
-networked synchronization features that Hyperdb builds on top of are not
-described in detail here; see the [DEP repository][deps]. Multi-writer
-hypercore semantics are also not discussed in this DEP.
-
-[deps]: https://github.com/datprotocol/DEPs
-
-A Hyperdb database instance can be represented by a single hypercore feed (or
-several feeds in a multi-writer context), and is named, referenced, and
-discovered using the public and discovery keys of the hypercore feed (or the
-original feed if there are several). In a single-writer configuration, only a
-single node (holding the secret key) can mutate the database (e.g., via `put` or
-`delete` actions).
-
-**Keys** can be any UTF-8 string. Path segments are separated by the forward
-slash character (`/`). Repeated slashes (`//`) are disallowed. Leading and
-trailing `/` are optional in application code: `/hello` and `hello` are
-equivalent. A key can be both a "path segment" and key at the same time; e.g.,
-`/a/b/c` and `/a/b` can both be keys at the same time.
-
-**Values** can be any binary blob, including empty (of zero length). For
-example, values could be UTF-8 encoded strings, JSON encoded objects, protobuf
-messages, or a raw `uint64` integer (of either endian-ness). Length is the only
-form of type or metadata stored about the value; deserialization and validation
-are left to library and application developers.
-
-
-## Core API Semantics
-[core_api]: #core_api
-
-A `db` is instantiated by opening an existing hypercore with hyperdb content
-(read-only, or optionally read-write if the secret key is available), or
-creating a new one. A handle could represent any specific revision in history,
-or the "latest" revision.
-
-`db.put(key, value)`: inserts `value` (arbitrary bytes) under the path `key`.
-Requires read-write access. Returns an error (e.g., via callback) if there was a
-problem.
-
-`db.get(key)`: Reading a non-existent `key` is an error. Read-only.
-
-`db.delete(key)`: Removes the key from the database. Deleting a non-existent
-key is an error. Requires read-write access.
-
-`db.list(prefix)`: returns a flat (not nested) list of all keys currently in
-the database under the given prefix. Prefixes operate on a path-segment basis:
-`/ab` is not a valid prefix for key `/abcd`, but is valid for `/ab/cd`. If the
-prefix does not exist, returns an empty list. The order of returned keys is
-implementation (or configuration) specific. Default listing is recursive
-(implementations may have a flag to control this behavior). Read-only.
-
-If the hypercore underlying a hyperdb is only partially replicated, behavior is
-implementation-specific. For example, a `get()` call could block until the
-relevant value is replicated, or the implementation could return an error.
-
-An example pseudo-code session working with a database might be:
-
- db.put('/life/animal/mammal/kitten', '{"cuteness": 500.3}')
- db.put('/life/plant/bush/banana', '{"delicious": 103.4}')
- db.delete('/life/plant/bush/banana')
- db.put('/life/plant/tree/banana', '{"delicious": 103.4}')
- db.get('/life/animal/mammal/kitten')
- => {"cuteness": 500.3}
- db.list('/life/')
- => ['/life/animal/mammal/kitten', '/life/plant/tree/banana']
-
-
-# Reference Documentation
-[reference-documentation]: #reference-documentation
-
-A hyperdb hypercore feed typically consists of a sequence of protobuf-encoded
-messages of "Entry" type. Higher-level protocols may make exception to this,
-for example by prepending an application-specific metadata message as the first
-entry in the feed. There is sometimes a second "content" feed associated with
-the primary hyperdb key/value feed, to store data that does not fit in the
-(limited) `value` size constraint.
-
-The sequence of entries includes an incremental index: the most recent entry in
-the feed contains metadata pointers that can be followed to efficiently look up
-any key in the database without needing to linear scan the entire history or
-generate an independent index data structure. Implementations are, of course,
-free to maintain their own index if they prefer.
-
-The Entry protobuf message schema is:
-
- message Entry {
- message Feed {
- required bytes key = 1;
- }
-
- required string key = 1;
- optional bytes value = 2;
- required bytes trie = 3;
- repeated uint64 clock = 4;
- optional uint64 inflate = 5;
- repeated Feed feeds = 6;
- optional bytes contentFeed = 7;
- }
-
-Some fields are are specific to the multi-writer features described in their
-own DEP and mentioned only partially here. The fields common to both message
-types are:
-
-- `key`: UTF-8 key that this node describes. Leading and trailing forward
- slashes (`/`) are always stripped before storing in protobuf.
-- `value`: arbitrary byte array. A non-existent `value` entry indicates that
- this Entry indicates a deletion of the key; this is distinct from specifying
- an empty (zero-length) value.
-- `trie`: a structured array of pointers to other Entry entries in the feed,
- used for navigating the tree of keys.
-- `clock`: reserved for use in the forthcoming `multi-writer` standard. An
- empty list is the safe (and expected) value for `clock` in single-writer use
- cases.
-- `inflate`: a "pointer" (reference to a feed index number) to the most recent
- entry in the feed that has the optional `feeds` and `contentFeed` fields set.
- Not defined if `feeds` and `contentFeed` are defined in the same message.
-- `feeds`: reserved for use with `multi-writer`. The safe single-writer value is
- to use the current feed's hypercore public key.
-- `contentFeed`: for applications which require a parallel "content" hypercore
- feed for larger data, this field can be used to store the 32-byte public key
- for that feed. This field should have a single value for the entire history
- of the feed (aka, it is not mutable).
-
-For the case of a single-writer feed, not using multi-writer features, it is
-sufficient to write a single Entry message as the first entry in the hypercore
-feed, with `feeds` containing a single entry (a pointer to the current feed
-itself), and `contentFeed` optionally set to a pointer to a paired content
-feed.
-
-If *either* `feeds` *or* `contentFeed` are defined in an entry, *both* fields
-must be defined, so the new message can be referred to via `inflated`. In this
-case the entry is refereed to as an "inflated entry".
-
-
-## Path Hashing
-[path_hashing]: #path_hashing
-
-Every key path has component-wise fixed-size hash representation that is used
-by the trie. The concatenation of all path hashes yields a "path hash array"
-for the entire key. Note that analogously to a hash map data structure, there
-can be more than one key (string) with the same key hash in the same database
-with no problems: the hash points to a linked-list "bucket" of Entries, which
-can be iterated over linearly to find the correct value.
-
-The path hash is represented by an array of bytes. Elements are 2-bit encoded
-(values 0, 1, 2, 3), except for an optional terminating element which has value
-4. Each path element consists of 32 values, representing a 64-bit hash of that
-path element. For example, the key `/tree/willow` has two path segments (`tree`
-and `willow`), and will be represented by a 65 element path hash array (two 32
-element hashes plus a terminator).
-
-The hash algorithm used is `SipHash-2-4`, with an 8-byte output and
-16-byte key; the input is the UTF-8 encoded path string segment, without any
-`/` separators or terminating null bytes. An implementation of this hash
-algorithm is included in the libsodium library in the form of the
-`crypto_shorthash()` function. A 16-byte "secret" key is required; for this use
-case we use all zeros.
-
-When converting the 8-bytehash to an array of 2-bit bytes, the ordering is
-proceed byte-by-byte, and for each take the two lowest-value bits (aka, `hash &
-0x3`) as byte index `0`, the next two bits (aka, `hash & 0xC`) as byte index
-`1`, etc. When concatenating path hashes into a longer array, the first
-("left-most") path element hash will correspond to byte indexes 0 through 31;
-the terminator (`4`) will have the highest byte index.
-
-For example, consider the key `/tree/willow`. `tree` has a hash `[0xAC, 0xDC,
-0x05, 0x6C, 0x63, 0x9D, 0x87, 0xCA]`, which converts into the array:
-
-```
-[ 0, 3, 2, 2, 0, 3, 1, 3, 1, 1, 0, 0, 0, 3, 2, 1, 3, 0, 2, 1, 1, 3, 1, 2, 3, 1, 0, 2, 2, 2, 0, 3 ]
-```
-
-
-`willow` has a 64-bit hash `[0x72, 0x30, 0x34, 0x39, 0x35, 0xA8, 0x21, 0x44]`,
-which converts into the array:
-
-```
-[ 2, 0, 3, 1, 0, 0, 3, 0, 0, 1, 3, 0, 1, 2, 3, 0, 1, 1, 3, 0, 0, 2, 2, 2, 1, 0, 2, 0, 0, 1, 0, 1 ]
-```
-
-These combine into the unified byte array with 65 elements:
-
-```
-[ 0, 3, 2, 2, 0, 3, 1, 3, 1, 1, 0, 0, 0, 3, 2, 1, 3, 0, 2, 1, 1, 3, 1, 2, 3, 1, 0, 2, 2, 2, 0, 3,
- 2, 0, 3, 1, 0, 0, 3, 0, 0, 1, 3, 0, 1, 2, 3, 0, 1, 1, 3, 0, 0, 2, 2, 2, 1, 0, 2, 0, 0, 1, 0, 1,
- 4 ]
-```
-
-As another example, the key `/a/b/c` converts into the 97-byte hash array:
-
-```
-[ 1, 2, 0, 1, 2, 0, 2, 2, 3, 0, 1, 2, 1, 3, 0, 3, 0, 0, 2, 1, 0, 2, 0, 0, 2, 0, 0, 3, 2, 1, 1, 2,
- 0, 1, 2, 3, 2, 2, 2, 0, 3, 1, 1, 3, 0, 3, 1, 3, 0, 1, 0, 1, 3, 2, 0, 2, 2, 3, 2, 2, 3, 3, 2, 3,
- 0, 1, 1, 0, 1, 2, 3, 2, 2, 2, 0, 0, 3, 1, 2, 1, 3, 3, 3, 3, 3, 3, 0, 3, 3, 2, 3, 2, 3, 0, 1, 0,
- 4 ]
-```
-
-Note that "hash collisions" are rare with this hashing scheme, but are likely
-to occur with large databases (millions of keys), and collision have been
-generated as a proof of concept. Implementations should take care to properly
-handle collisions by verifying keys and following bucket pointers (see the next
-section).
-
-An example hash collision (useful for testing; thanks to Github user
-`dcposch`):
-
- /mpomeiehc
- /idgcmnmna
-
-<!--
-
-Generation code (javascript) for the above:
-
- var sodium = require('sodium-universal')
- var toBuffer = require('to-buffer')
-
- var KEY = Buffer.alloc(16)
- var OUT = Buffer.alloc(8)
-
- sodium.crypto_shorthash(OUT, toBuffer('tree'), KEY)
- console.log("tree: ", OUT)
- console.log(hash('tree', true))
-
- sodium.crypto_shorthash(OUT, toBuffer('willow'), KEY)
- console.log("willow: ", OUT)
- console.log(hash('willow', true))
-
- sodium.crypto_shorthash(OUT, toBuffer('a'), KEY)
- console.log("a: ", OUT)
- console.log(hash('a', true))
-
-Then, to manually "expand" arrays in python3:
-
- hash_array = [0x72, 0x30, 0x34, 0x39, 0x35, 0xA8, 0x21, 0x44]
- path = []
- tmp = [(x & 0x3, (x >> 2) & 0x3, (x >> 4) & 0x3, (x >> 6) & 0x3) for x in hash_array]
- [path.extend(e) for e in tmp]
- path
-
--->
-
-
-## Incremental Index Trie
-[trie_index]: #trie_index
-
-Each node stores a *prefix [trie](https://en.wikipedia.org/wiki/Trie)* that
-can be used to look up other keys, or to list all keys with a given prefix.
-This is stored under the `trie` field of the Entry protobuf message.
-
-The trie effectively mirrors the path hash array. Each element in the `trie`
-is called a "bucket". Each non-empty bucket points to the newest Entries which
-have an identical path up to that specific prefix location; because the trie
-has 4 "values" at each node, there can be pointers to up to 3 other values at a
-given element in the trie array. Buckets can be empty if there are no nodes
-with path hashes that differ for the first time the given bucket (aka, there
-are no "branches" at this node in the trie). Only non-null elements will be
-transmitted or stored on disk.
-
-The data structure of the trie is a sparse array of pointers to other Entry
-entries. Each pointer indicates a feed index and an entry index pointer, and is
-associated with a 2-bit value; for the non-multi-writer case, the feed index is
-always 0, so we consider only the entry index.
-
-For a `trie` with `N` buckets, each may have zero or more pointers. Typically
-there are a maximum of 3 pointers per bucket, corresponding to the 4 possible
-values minus the current Entry's value, but in the event of hash collisions (in
-the path array space), there may be multiple pointers in the same bucket
-corresponding to the same value.
-
-To lookup a key in the database, the recipe is to:
-
-1. Calculate the path hash array for the key you are looking for.
-2. Select the most-recent ("latest") Entry for the feed.
-3. Compare path hash arrays. If the paths match exactly, compare keys; they
- match, you have found the you were looking for! Check whether the `value` is
- defined or not; if not, this Entry represents that the key was deleted from
- the database.
-4. If the paths match, but not the keys, look for a pointer in the last `trie`
- array index, and iterate from step #3 with the new Entry.
-5. If the paths don't entirely match, find the first index at which the two
- arrays differ, and look up the corresponding element in this Entry's `trie`
- array. If the element is empty, or doesn't have a pointer corresponding to
- your 2-bit value, then your key does not exist in this hyperdb.
-6. If the trie element is not empty, then follow that pointer to select the
- next Entry. Recursively repeat this process from step #3; you will be
- descending the `trie` in a search, and will either terminate in the Entry you
- are looking for, or find that the key is not defined in this hyperdb.
-
-Similarly, to write a key to the database:
-
-1. Calculate the path hash array for the key, and start with an empty `trie` of
- the same length; you will write to the `trie` array from the current index,
- which starts at 0.
-2. Select the most-recent ("latest") Entry for the feed.
-3. Compare path hash arrays. If the paths match exactly, and the keys match, then
- you are overwriting the current Entry, and can copy the "remainder" of it's
- `trie` up to your current `trie` index.
-4. If the paths match, but not the keys, you are adding a new key to an
- existing hash bucket. Copy the `trie` and extend it to the full length. Add
- a pointer to the Entry with the same hash at the final array index.
-5. If the paths don't entirely match, find the first index at which the two
- arrays differ. Copy all `trie` elements (empty or not) into the new `trie`
- for indices between the "current index" and the "differing index".
-6. Next look up the corresponding element in this Entry's `trie` array at the
- differing index. If this element is empty, then you have found the most
- similar Entry. Write a pointer to this node to the `trie` at
- the differing index, and you are done (all remaining `trie` elements are
- empty, and can be omitted).
-7. If the differing tree element has a pointer (is not empty), then follow that
- pointer to select the next `Entry`. Recursively repeat this process from step
- #3.
-
-To delete a value, follow the same procedure as adding a key, but write the
-`Entry` without a `value` (in protobuf, this is distinct from having a `value`
-field with zero bytes). Deletion nodes may persist in the database forever.
-
-
-## Binary Trie Encoding
-[trie_encoding]: #trie_encoding
-
-The following scheme is used to encode trie data structures (sparse, indexed
-arrays of pointers to entries) into a variable-length bytestring as the `trie`
-field of an Entry protobuf message.
-
-Consider a trie array with `N` buckets and `M` non-empty buckets (`0 <= M <=
-N`). In the encoded field, there will be `M` concatenated bytestrings of the
-form:
-
-- trie index (varint), followed by...
-- bucket bitfield (packed in a varint), encoding which of the 5 values (4
- values if the index is not modulo 32) at this node of the trie have pointers,
- followed by one or more...
-- pointer sets, each referencing an entry at (feed index, entry index):
- - feed index (varint, with a extra "more pointers at this value" low bit,
- encoded as `feed_index << 1 | more_bit`)
- - entry index (varint)
-
-In the common case for a small/sparse hyperdb, there will a small number of
-non-empty buckets (small `M`), a usually a single `(feed index, entry index)`
-pointer for those non-empty buckets. For a very large/dense hyperdb (millions
-of key/value pairs), there will be many non-empty buckets (`M` approaching
-`N`), and buckets may have up to the full 4 pointer sets. Even with millions of
-entries, hash collisions will be very rare; in those cases there will be
-multiple pointers in the same pointer set.
-
-Consider an entry with path hash:
-
-```
-[ 1, 1, 0, 0, 3, 1, 2, 3, 3, 1, 1, 1, 2, 2, 1, 1, 1, 0, 2, 3, 3, 0, 1, 2, 1, 1, 2, 3, 0, 0, 2, 1,
- 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 3, 1, 0, 0, 2, 3, 0, 1, 3, 2, 0, 3, 2, 0, 1, 0, 3, 2, 0, 2, 1, 1,
- 4 ]
-```
-
-and trie:
-
-```
-[ , { val: 1, feed: 0, index: 1 } ]
-```
-
-In this case `N` is 64 (or you could count as 2 if you ignore trailing empty
-entries) and `M` is 1. There will be a single bytestring chunk:
-
-- index varint is `1` (second element in the trie array)
-- bitfield is `0b0010` (varint 2): there is only a pointer set for value 1 (the second value)
-- there is a single pointer in the pointer set, which is (`feed=0 << 1 | 0`,
- `index=1`), or (varint 2, varint 1)
-
-Combined, the `trie` bytestring will be:
-
-```
-[0x01, 0x02, 0x02, 0x02]
-```
-
-For a more complex example, consider the same path hash, but the trie:
-
-```
-[ , { val: 1, feed: 0, index: 1; val: 2, feed: 5, index: 3; val: 3, feed: 6, index: 98 }, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
- , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
- { val: 4, feed: 0, index, 23; val: 4, feed: 1, index: 17 ]
-```
-
-Now `M` is 2. The first bytestring chunk will be:
-
-- index varint is `1` (second element in the trie array)
-- bitfield is `0b01110` (varint 9): there are three pointer sets
-- first pointer set is (`feed=0 << 1 | 0`, `index=1`) or (varint 2, varint 1)
-- second pointer set is (`feed=5 << 1 | 0`, `index=3`) or (varint 10, varint 3)
-- third pointer set is (`feed=6 << 1 | 0`, `index=98`) or (varint 12, varint 98)
-
-the second bytestring chunk would be:
-
-- index varint is `64` (65th element in the trie array; the terminating value)
-- bitfield is `0b10000` (varint 1); there is a single pointer set... but it
- contains a hash collision, so there are two pointers
-- first pointer is (`feed=0 << 1 | 1`, `index=23`) or (varint 1, varint=23);
- note the `more` bit is set high!
-- second pointer is (`feed=1 << 1 | 0`, `index=17`) or (varint 2, varint 17);
- note the `more` bit is low, as usual. In the extremely unlikely case of
- multiple collisions there could have been more pointers with `more` high
- preceding this one.
-
-The overall bytestring would be:
-
-```
-[0x01, 0x09, 0x02, 0x01, 0x0A, 0x03, 0x0C, 0x62, 0x40, 0x10, 0x01, 0x17, 0x02, 0x11]
-```
-
-
-# Examples
-[examples]: #examples
-
-
-## Simple Put and Get
-
-Starting with an empty hyperdb `db`, if we `db.put('/a/b', '24')` we expect to
-see a single `Entry` and index 0:
-
-```
-{ key: 'a/b',
- value: '24',
- trie:
- [ ] }
-```
-
-For reference, the path hash array for this key (index 0) is:
-
-```
-[ 1, 2, 0, 1, 2, 0, 2, 2, 3, 0, 1, 2, 1, 3, 0, 3, 0, 0, 2, 1, 0, 2, 0, 0, 2, 0, 0, 3, 2, 1, 1, 2,
- 0, 1, 2, 3, 2, 2, 2, 0, 3, 1, 1, 3, 0, 3, 1, 3, 0, 1, 0, 1, 3, 2, 0, 2, 2, 3, 2, 2, 3, 3, 2, 3,
- 4 ]
-```
-
-Note that the first 64 bytes in path match those of the `/a/b/c` example from
-the [path hashing][path_hash] section, because the first two path components
-are the same. Since this is the first entry, the entry index is 0.
-
-Now we `db.put('/a/c', 'hello')` and expect a second Entry:
-
-```
-{ key: 'a/c',
- value: 'hello',
- trie:
- [ , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
- , , { element: 2, feed: 0, index: 0 } ] }
-```
-
-The path hash array for this key (index 1) is:
-
-```
-[ 1, 2, 0, 1, 2, 0, 2, 2, 3, 0, 1, 2, 1, 3, 0, 3, 0, 0, 2, 1, 0, 2, 0, 0, 2, 0, 0, 3, 2, 1, 1, 2,
- 0, 1, 1, 0, 1, 2, 3, 2, 2, 2, 0, 0, 3, 1, 2, 1, 3, 3, 3, 3, 3, 3, 0, 3, 3, 2, 3, 2, 3, 0, 1, 0,
- 4 ]
-```
-
-The first 32 characters of path are common with the first Entry (they share a
-common prefix, `/a`).
-
-`trie` is defined, but mostly sparse. The first 32 elements of common prefix
-match the first Entry, and then two additional hash elements (`[0, 1]`) happen
-to match as well; there is not a differing entry until index 34 (zero-indexed).
-At this entry there is a reference pointing to the first Entry. An additional 29
-trailing null entries have been trimmed in reduce metadata overhead.
-
-Next we insert a third node with `db.put('/x/y', 'other')`, and get a third Entry:
-
-```
-{ key: 'x/y',
- value: 'other',
- trie:
- [ , { val: 1, feed: 0, index: 1 } ],
-```
-
-The path hash array for this key (index 2) is:
-
-```
-[ 1, 1, 0, 0, 3, 1, 2, 3, 3, 1, 1, 1, 2, 2, 1, 1, 1, 0, 2, 3, 3, 0, 1, 2, 1, 1, 2, 3, 0, 0, 2, 1,
- 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 3, 1, 0, 0, 2, 3, 0, 1, 3, 2, 0, 3, 2, 0, 1, 0, 3, 2, 0, 2, 1, 1,
- 4 ]
-```
-
-Consider the lookup-up process for `db.get('/a/b')` (which we expect to
-successfully return `'24'`, as written in the first Entry). First we calculate
-the path for the key `a/b`, which will be the same as the first Entry. Then we
-take the "latest" Entry, with entry index 2. We compare the path hash arrays,
-starting at the first element, and find the first difference at index 1 (`1 ==
-1`, then `1 != 2`). We look at index 1 in the current Entry's `trie` and find a
-pointer to entry index 1, so we fetch that Entry and recurse. Comparing path
-hash arrays, we now get all the way to index 34 before there is a difference.
-We again look in the `trie`, find a pointer to entry index 0, and fetch the
-first Entry and recurse. Now the path elements match exactly; we have found the
-Entry we are looking for, and it has an existent `value`, so we return the
-`value`.
-
-Consider a lookup for `db.get('/a/z')`; this key does not exist, so we expect
-to return with "key not found". We calculate the path hash array for this key:
-
-```
-[ 1, 2, 0, 1, 2, 0, 2, 2, 3, 0, 1, 2, 1, 3, 0, 3, 0, 0, 2, 1, 0, 2, 0, 0, 2, 0, 0, 3, 2, 1, 1, 2,
- 1, 2, 3, 0, 1, 0, 1, 1, 1, 1, 2, 1, 1, 1, 0, 1, 0, 3, 3, 2, 0, 3, 3, 1, 1, 0, 2, 1, 0, 1, 1, 2,
- 4 ]
-```
-
-Similar to the first lookup, we start with entry index 2 and follow the pointer to
-entry index 1. This time, when we compare path hash arrays, the first differing
-entry is at array index `32`. There is no `trie` entry at this index, which
-tells us that the key does not exist in the database.
-
-## Listing a Prefix
-
-Continuing with the state of the database above, we call `db.list('/a')` to
-list all keys with the prefix `/a`.
-
-We generate a path hash array for the key `/a`, without the terminating symbol
-(`4`):
-
-```
-[ 1, 2, 0, 1, 2, 0, 2, 2, 3, 0, 1, 2, 1, 3, 0, 3, 0, 0, 2, 1, 0, 2, 0, 0, 2, 0, 0, 3, 2, 1, 1, 2 ]
-```
-
-Using the same process as a `get()` lookup, we find the first Entry that
-entirely matches this prefix, which will be entry index 1. If we had failed to
-find any Entry with a complete prefix match, then we would return an empty list
-of matching keys.
-
-Starting with the first prefix-matching node, we save that key as a match
-(unless the Entry is a deletion), then select all `trie` pointers with an index
-higher than the prefix length, and recursively inspect all pointed-to Entries.
-
-## Deleting a Key
-
-Continuing with the state of the database above, we call `db.delete('/a/c')` to
-remove that key from the database.
-
-The process is almost entirely the same as inserting a new Entry at that key,
-except that the `value` field is undefined. The new Entry (at entry index 3)
-is:
-
-```
-{ key: 'a/c',
- value: ,
- trie: [ , { val: 1, feed: 0, index: 2 }, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
- , , { val: 1, feed: 0, index: 0 } ] }
-```
-
-The path hash array for this Entry (key) is:
-
-```
-[ 1, 2, 0, 1, 2, 0, 2, 2, 3, 0, 1, 2, 1, 3, 0, 3, 0, 0, 2, 1, 0, 2, 0, 0, 2, 0, 0, 3, 2, 1, 1, 2,
- 0, 1, 1, 0, 1, 2, 3, 2, 2, 2, 0, 0, 3, 1, 2, 1, 3, 3, 3, 3, 3, 3, 0, 3, 3, 2, 3, 2, 3, 0, 1, 0,
- 4 ]
-```
-
-
-# Drawbacks
-[drawbacks]: #drawbacks
-
-A backwards-incompatible change will have negative effects on the broader dat
-ecosystem: clients will need to support both versions protocol for some time
-(increasing maintenance burden), future clients may not inter-operate with old
-archives, etc. These downsides can partially be avoided by careful roll-out.
-
-For the specific use case of Dat archives, hyperdb will trivially increase
-metadata size (and thus disk and network consumption) for archives with few
-files.
-
-
-# Overhead and Scaling
-[overhead]: #overhead
-
-The metadata overhead for a single database entry varies based on the size of
-the database. In a "heavy" case, considering a two-path-segment key with an
-entirely saturated `trie` and `uint32`-sized feed and entry index pointers, and
-ignoring multi-writer fields:
-
-- `trie`: 4 * 2 * 64 bytes = 512 bytes
-- total: 512 bytes
-
-In a "light" case, with few `trie` entries and single-byte varint feed and
-entry index pointers:
-
-- `trie`: 2 * 2 * 4 bytes = 16 bytes
-- total: 16
-
-For a database with most keys having N path segments, the cost of a `get()`
-scales with the number of entries M as `O(log(M))` with best case 1 lookup and
-worst case `4 * 32 * N = 128 * N` lookups (for a saturated `trie`).
-
-The cost of a `put()` or `delete()` is proportional to the cost of a `get()`.
-
-The cost of a `list()` is linear (`O(M)`) in the number of matching entries,
-plus the cost of a single `get()`.
-
-The total metadata overhead for a database with M entries scales with `O(M
-* log(M))`.
-
-
-# Privacy and Security Considerations
-[privacy]: #privacy
-
-The basic key/value semantics of hyperdb (as discussed in this DEP, not
-considering multi-writer changes) are not known to introduce new privacy issues
-when compared with, e.g., storing binary values at key-like paths using the
-current ("legacy") hyperdrive system.
-
-A malicious writer could cause trouble for readers, even readers who do not
-trust the application-level contents of a feed. Implementations which may be
-exposed to arbitrary feeds from unknown sources on the internet should take
-care to the following scenarios: A malicious writer may be able to produce very
-frequent key path hash collisions, which could degrade to linear performance. A
-malicious writer could send broken trie structures that contain pointer loops,
-duplicate pointers, or other invalid contents. A malicious writer could write
-arbitrary data in value fields in an attempt to exploit de-serialization bugs.
-A malicious writer could leverage non-printing unicode characters to create
-database entries with user-indistinguishable names (keys).
-
-
-# Rationale and alternatives
-[alternatives]: #alternatives
-
-A major motivator for hyperdb is to improve scaling performance with tens of
-thousands through millions of files per directory in the existing hyperdrive
-implementation. The current implementation requires the most recent node in a
-directory to point to all other nodes in the directory. Even with pointer
-compression, this requires on the order of `O(N^2)` bytes; the hyperdb
-implementation scales with `O(N log(N))`.
-
-The hyperdb specification (this document) is complicated by the inclusion of
-new protobuf fields to support "multi-writer" features which are not described
-here. The motivation to include these fields now to make only a single
-backwards-incompatible schema change, and to make a second software-only change
-in the future to enable support for these features. Schema and data format
-changes are considered significantly more "expensive" for the community and
-software ecosystem compared to software-only changes. Attempts have been made
-in this specification to indicate the safe "single-writer-only" values to use
-for these fields.
-
-
-# Dat migration logistics
-[migration]: #migration
-
-Hyperdb is not backwards compatible with the existing hyperdrive metadata,
-meaning dat clients may need to support both versions during a transition
-period. This applies both to archives saved to disk (e,g., in SLEEP) and to
-archives received and published to peers over the network.
-
-No changes to the Dat network wire protocol itself are necessary, only changes
-to content passed over the protocol. The Dat `content` feed, containing raw
-file data, is not impacted by hyperdb, only the contents of the `metadata`
-feed.
-
-Upgrading a Dat (hyperdrive) archive to hyperdb will necessitate creating a new
-feed from scratch, meaning new public/private key pairs, and that public key
-URL links will need to change.
-
-Further logistical details are left to the forthcoming Multi-Writer DEP.
-
-
-# Unresolved questions
-[unresolved]: #unresolved-questions
-
-Need to think through deletion process with respect to listing a path prefix;
-will previously deleted nodes be occluded, or potentially show up in list
-results? Should be reviewed (by a non-author of this document) before accepted
-as a Draft.
-
-Can the deletion process (currently leaving "tombstone" entries in the `trie`
-forever) be improved, such that these entries don't need to be iterated over?
-mafintosh mentioned this might be in the works. Does this DEP need to "leave
-room" for those changes, or should we call out the potential for future change?
-Probably not, should only describe existing solutions. This can be resolved
-after Draft.
-
-There are implied "reasonable" limits on the size (in bytes) of both keys and
-values, but they are not formally specified. Protobuf messages have a hard
-specified limit of 2 GByte (due to 32-bit signed arithmetic), and most
-implementations impose a (configurable) 64 MByte limit. Should this DEP impose
-specific limits on key and value sizes? Would be good to decide before Draft
-status.
-
-Apart from leaving fields in the protobuf message specification, multi-writer
-concerns are explicitly out of scope for this DEP.
-
-
-# Changelog
-[changelog]: #changelog
-
-As of March 2018, Mathias Buus (@mafintosh) is leading development of a hyperdb
-nodejs module on [github](https://github.com/mafintosh/hyperdb), which is the
-basis for this DEP.
-
-- 2017-12-06: Stephen Whitmore (@noffle) publishes `ARCHITECTURE.md` overview
- in the [hyperdb github repo][arch_md]
-- 2018-03-04: First draft for review
-- 2018-03-15: Hyperdb v3.0.0 is released
-- 2018-04-18: This DEP submitted for Draft review.
-- 2018-05-06: Merged as Draft after WG approval.
-
-[arch_md]: https://github.com/mafintosh/hyperdb/blob/master/ARCHITECTURE.md