diff options
author | Max Ogden <max@maxogden.com> | 2016-07-07 10:34:55 -0700 |
---|---|---|
committer | Max Ogden <max@maxogden.com> | 2016-07-07 10:35:12 -0700 |
commit | ce76a0a63413c5dba5c1bf6b2c25fd02202ec35e (patch) | |
tree | 47e1d4ff2572ff089eeca54c278eafbe0dabe507 | |
parent | 7cb7c667d14354146ef6fc23636d1faf4baaf376 (diff) | |
download | dat-docs-ce76a0a63413c5dba5c1bf6b2c25fd02202ec35e.tar.gz dat-docs-ce76a0a63413c5dba5c1bf6b2c25fd02202ec35e.zip |
start on sleep v2 spec
-rw-r--r-- | contents.json | 2 | ||||
-rw-r--r-- | meta.dat.md | 101 | ||||
-rw-r--r-- | papers/dat-paper.md | 121 | ||||
-rw-r--r-- | sleep.md | 163 |
4 files changed, 281 insertions, 106 deletions
diff --git a/contents.json b/contents.json index 3238dfe..76ec577 100644 --- a/contents.json +++ b/contents.json @@ -8,7 +8,7 @@ }, "Specification": { "hyperdrive": "hyperdrive.md", - "meta.dat": "meta.dat.md" + "sleep": "sleep.md" }, "References": { "API": "api.md", diff --git a/meta.dat.md b/meta.dat.md deleted file mode 100644 index 989879f..0000000 --- a/meta.dat.md +++ /dev/null @@ -1,101 +0,0 @@ -# meta.dat - -Dat uses a simple metadata file called `meta.dat`. The purpose of this file is to store the fingerprints of the files in a Dat repository. If you create a `meta.dat` file for a set of files, you can host it on a static HTTP server along with the files and Dat clients will be able to download and verify your files, even if you aren't running a Dat server! - -# File format - -``` -<Header><Entries Index...><Entries...> -``` - -The format is a header followed by an index of many entries. Entry order is based on the indexing determined by the [Flat In-Order Tree](hyperdrive.md#flat-in-order-trees) algorithm we use in Dat. After the entry index, a concatinated list of entries follows. - -### Header format - -``` -<varint header-length><header protobuf> -``` - -The header protobuf has this schema: - -``` protobuf -message Header { - required bytes datLink = 1; - required uint64 entries = 2; - optional bool isSigned = 3; - optional string hashType = 4 [default = "sha256"]; - optional uint32 hashLength = 5 [default = 32]; - optional string signatureType = 6 [default = "ed25519"]; - optional uint32 signatureLength = 7 [default = 64]; -} -``` - -### Entry index format - -For non-signed entries: - -``` -<8-byte-chunk-end><chunk-hash> -``` - -The 8-byte-chunk-end is an unsigned big endian 64 bit integer that should be the absolute position in the file for the **end of the chunk**. - -For signed entries in live feeds (only applies to even numbered nodes e.g. leaf nodes): - -``` -<8-byte-chunk-end><chunk-signature><chunk-hash> -``` - -For any odd nodes, in either a live or a non-live feed, the non-signed entry format will be used. - -## Example - -Given a tree like this you might want to look up in a `meta.dat` file the metadata for a specific node: - -``` -0─┐ - 1─┐ -2─┘ │ - 3 -4─┐ │ - 5─┘ -6─┘ -``` - -If you wanted to look up the metadata for 3, you could read the third (or any!) entry from meta.dat: - -First you have to read the varint at the beginning of the file so you know how big the header is: - -``` js -var varint = require('varint') // https://github.com/chrisdickinson/varint -var headerLength = varint.decode(firstChunkOfFile) -``` - -Now you can read the header from the file - -``` js -var headerOffset = varint.encodingLength(headerLength) -var headerEndOffset = headerOffset + headerLength -var headerBytes = firstChunkOfFile.slice(headerOffset, headerEndOffset) -``` - -To decode the header use the protobuf schema. We can use the [protocol-buffers](https://github.com/mafintosh/protocol-buffers) module to do that. - -``` js -var messages = require('protocol-buffers')(fs.readFileSync('meta.dat.proto')) -var header = messages.Header.decode(headerBytes) -``` - -Now we have all the configuration required to calculate an entry offset. - -``` js -var entryNumber = 42 -var entryOffset = headerEndOffset + entryNumber * (8 + header.hashLength) -``` - -If you have a signed feed, you have to take into account the extra space required for the signatures in the even nodes. - -``` js -var entryOffset = headerLength + entryNumber * (8 + header.hashLength) - + Math.floor(entryNumber / 2) * header.signatureLength -``` diff --git a/papers/dat-paper.md b/papers/dat-paper.md index 7ad3930..c2c5fe3 100644 --- a/papers/dat-paper.md +++ b/papers/dat-paper.md @@ -68,7 +68,7 @@ Dat is a file sharing protocol that does not assume a dataset is static or that - 2. **Content Integrity** - Data and publisher integrity is verified through use of signed hashes of the content. - 3. **Parallel Replication** - Subsets of the data can be accessed from multiple peers simultaneously, improving transfer speeds. - 4. **Streaming Updates** - Datasets can be updated and distributed in real time to downstream peers. -- 5. **Secure Metadata** - Dat employs a capability system whereby anyone with a Dat link can connect to the swarm, but the link itself is a secure hash that is difficult to guess. +- 5. **Private Metadata** - Dat employs a capability system whereby anyone with a Dat link can connect to the swarm, but the link itself is a secure hash that is difficult to guess. ## 3.1 Mirroring @@ -122,13 +122,17 @@ In a scenario where two peers A and B want to connect, and both know the central ## 3.2 Content Integrity -Content integrity means being able to verify the data you received is the exact same version of the data that you expected. This is imporant in a distributed system as this mechanism will catch incorrect data sent by bad peers. It also has implications for reproducibility as it lets you refer to a specific version of the dataset you want. +Content integrity means being able to verify the data you received is the exact same version of the data that you expected. This is imporant in a distributed system as this mechanism will catch incorrect data sent by bad peers. It also has implications for reproducibility as it lets you refer to a specific version of a dataset. -A common issue in data analysis is when data changes but the link to the data remains the same. For example, one day a file called data.zip might change, but a simple HTTP link to the file does not include a hash of the content, so clients that only have the HTTP link have no way to check if the file changed. Looking up a file by the hash of its content is called content addressability, and lets users not only verify that the data they receive is the version of the data they want, but also lets people cite specific versions of the data by referring to a specific hash. +A common issue in data analysis is when data changes but the link to the data remains the same. For example, one day a file called data.zip might change, but a typical HTTP link to the file does not include a hash of the content, or provide a way to get updated metadata, so clients that only have the HTTP link have no way to check if the file changed without downloading the entire file again. Referring to a file by the hash of its content is called content addressability, and lets users not only verify that the data they receive is the version of the data they want, but also lets people cite specific versions of the data by referring to a specific hash. + +### Hypercore and Hyperdrive Data storage and content integrity in Dat is implemented in a module called Hypercore. Given a stream of binary data, Hypercore splits the stream into chunks using Rabin fingerprints, hashes each chunk, and arranges the hashes in a specific type of Merkle tree that allows for certain replication properties. In addition to providing a content addressing system, Hypercore also provides a network protocol for exchanging chunks with peers. -Hypercore is agnostic to the format of the input data, it operates on any stream of binary data. For the Dat use case of synchronizing datasets we wrote and use a file system module on top of Hypercore called Hyperdrive. There are other abstractions you can write on top of Hypercore instead of Hyperdrive/Dat such as Merkle DAGs but these are outside the scope of this paper. +Hypercore is agnostic to the format of the input data, it operates on any stream of binary data. For the Dat use case of synchronizing datasets we wrote and use a file system module on top of Hypercore called Hyperdrive. We have a layered abstraction so that if someone wishes they can use Hypercore directly to have full control over how they model their data. Hyperdrive works well when your data can be represented as files on a filesystem, which is our main use case with Dat. + +### Registers Central to the design of Hypercore is the notion of a register. This is a binary append-only feed (Kappa architecture) whose contents are cryptographically hashed and signed and therefore can be trusted. Hypercore lets you create many registers, and replicates them when synchronizing with another peer. @@ -180,8 +184,117 @@ var feed = [{ }] ``` +Registers can also be signed with a private key, allowing anyone with the corresponding public key to verify that new entries to the register were created by a holder of the private key. More on this in section 3.4. + ## 3.3 Parallel Replication +Hypercore provides a replication protocol so two peers can communicate over a stateless messaging channel to discover and exchange data. Messages are encoded using Protocol Buffers. The protocol has nine message types: + +#### Open + +This should be the first message sent and is also the only message without a type. It looks like this: + +``` protobuf +message Open { + required bytes feed = 1; + required bytes nonce = 2; +} +``` + +The `feed` should be set to the discovery key as specified above. The `nonce` should be set to 24 bytes of high entropy random data. When running in encrypted mode this is the only message sent unencrypted. + +### `0` Handshake + +This message is sent after sending an open message so it will be encrypted and we won't expose our peer id to a third party. + +``` protobuf +message Handshake { + required bytes id = 1; + repeated string extensions = 2; +} +``` + +### `1` Have + +Have messages give the other peer information about which blocks of data you have. + +``` protobuf +message Have { + required uint64 start = 1; + optional uint64 end = 2; + optional bytes bitfield = 3; +} +``` + +You can use `start` and `end` to represent a range of data block bin numbers. If using a bitfield it should be encoded using a run length encoding described above. It is a good idea to send a have message soon as possible if you have blocks to share to reduce latency. + +### `2` Want + +You can send a have message to give the other peer information about which blocks of data you want to have. It has type `2`. + +``` protobuf +message Want { + required uint64 start = 1; + optional uint64 end = 2; +} +``` + +You should only send the want message if you are interested in a section of the feed that the other peer has not told you about. + +### `3` Request + +Send this message to request a block of data. You can request a block by block index or byte offset. If you are only interested +in the hash of a block you can set the hash property to true. The nodes property can be set to a tree digest of the tree nodes you already +have for this block or byte range. A request message has type `3`. + +``` protobuf +message Request { + optional uint64 block = 1; + optional uint64 bytes = 2; + optional bool hash = 3; + optional uint64 nodes = 4; +} +``` + +### `4` Data + +Send a block of data to the other peer. You can use this message to reply to a request or optimistically send other blocks of data to the other client. It has type `4`. + +``` protobuf +message Data { + message Node { + required uint64 index = 1; + required uint64 size = 2; + required bytes hash = 3; + } + + required uint64 block = 1; + optional bytes value = 2; + repeated Node nodes = 3; + optional bytes signature = 4; +} +```` + +### `5` Cancel + +Cancel a previous sent request. It has type `5`. + +``` protobuf +message Cancel { + optional uint64 block = 1; + optional uint64 bytes = 2; +} +``` + +### `6` Pause + +An empty message that tells the other peer that they should stop requesting new blocks of data. It has type `6`. + +### `7` Resume + +An empty message that tells the other peer that they can continue requesting new blocks of data. It has type `7`. + + ## 3.4 Streaming Updates ## 3.5 Secure Metadata diff --git a/sleep.md b/sleep.md new file mode 100644 index 0000000..518931c --- /dev/null +++ b/sleep.md @@ -0,0 +1,163 @@ +# SLEEP Data Format + +### Syncable Lightweight Event Emitting Persistence +### Version 2.0 + +SLEEP is a metadata format that allows data to be accessed randomly, cryptographically verified, and updated live. It has two parts, a file format that contains content addressed file metadata in a representation specifically designed to allow partial streaming access to individual chunks of data, and a 53 character identifier scheme that can be used to address the entire data set. + +The file format is a single file designed to be easy to distribute, and we also specify a way to expose SLEEP over REST. + +The SLEEP format can be used in a similar way to how MD5 checksums are used over HTTP today, to verify the integrity of data downloads. Whereas MD5 or SHA are usually checksums of the whole data set, meaning consumers have to download the entire all available data before they are able to verify the integrity of any of it, SLEEP allows a set of data to be split in to many small pieces, each one getting it's own cryptographically secure checksum. This allows consumers to download subsets metadata and data, in whatever order they prefer, but allowing them to verify the integrity of each piece of data as it is accessed. + +## Registers + +SLEEP is designed around the concept of a register, an append only list that you can trust. The contents of a register are cryptographically fingerprinted and an aggregate checksum can be used to verify the contents of the register have not been tampered with. There are various ways to calculate these aggregate checksums but the data in a register is a binary append only feed, e.g. an infinite array of buffers that can only be updated by appending new buffers at the end. + +In SLEEP we construct a Merkle tree where the leaf nodes are the buffers in the register, and the rest of the non-leaf nodes in the tree are derived Merkle hashes. + +So, given a register with four values: + +``` +1. a +2. b +3. c +4. d +``` + +We would construct a Merkle tree where the leaf nodes are our four values: + +``` +a -> hash(a) + hash(hash(a) + hash(b)) +b -> hash(b) + hash(hash(hash(a) + hash(b)) + hash(hash(c) + hash(d))) +c -> hash(c) + hash(hash(c) + hash(d)) +d -> hash(d) +``` + +To be able to refer to a specific node in the tree we use an in-order node traversal to assign integers to the nodes: + +``` +0 + 1 +2 + 3 +4 + 5 +6 +``` + +In-order node numbering has the property with our trees that leaf nodes are always even and non-leaf nodes are always odd. + +The three derived odd values (1, 3, 5) are + +In our use case, the hashes of the actual content are always even numbers, at the wide end of the tree. So the above tree had four original values that become the even numbers: + +The leaf data is not fixed sized +the hashes and offsets are fixed sized + +node sizes are stored to allow random access to byte ranges with minimal roundtrips +if you dont care about large metadata its ok to skip this optimization + + +# File format + +yo doggie +foo + +``` +<Header><Entries Index...><Entries...> +``` + +The format is a header followed by an index of many entries. Entry order is based on the indexing determined by the [Flat In-Order Tree](hyperdrive.md#flat-in-order-trees) algorithm we use in Dat. After the entry index, a concatenated list of entries follows. + +### Header format + +``` +<varint header-length><header protobuf> +``` + +The header protobuf has this schema: + +``` protobuf +message Header { + required bytes datLink = 1; + required uint64 entries = 2; + optional bool isSigned = 3; + optional string hashType = 4 [default = "sha256"]; + optional uint32 hashLength = 5 [default = 32]; + optional string signatureType = 6 [default = "ed25519"]; + optional uint32 signatureLength = 7 [default = 64]; +} +``` + +### Entry index format + +For non-signed entries: + +``` +<8-byte-chunk-end><chunk-hash> +``` + +The 8-byte-chunk-end is an unsigned big endian 64 bit integer that should be the absolute position in the file for the **end of the chunk**. + +For signed entries in live feeds (only applies to even numbered nodes e.g. leaf nodes): + +``` +<8-byte-chunk-end><chunk-signature><chunk-hash> +``` + +For any odd nodes, in either a live or a non-live feed, the non-signed entry format will be used. + +## Example + +Given a tree like this you might want to look up in a `meta.dat` file the metadata for a specific node: + +``` +0─┐ + 1─┐ +2─┘ │ + 3 +4─┐ │ + 5─┘ +6─┘ +``` + +If you wanted to look up the metadata for 3, you could read the third (or any!) entry from meta.dat: + +First you have to read the varint at the beginning of the file so you know how big the header is: + +``` js +var varint = require('varint') // https://github.com/chrisdickinson/varint +var headerLength = varint.decode(firstChunkOfFile) +``` + +Now you can read the header from the file + +``` js +var headerOffset = varint.encodingLength(headerLength) +var headerEndOffset = headerOffset + headerLength +var headerBytes = firstChunkOfFile.slice(headerOffset, headerEndOffset) +``` + +To decode the header use the protobuf schema. We can use the [protocol-buffers](https://github.com/mafintosh/protocol-buffers) module to do that. + +``` js +var messages = require('protocol-buffers')(fs.readFileSync('meta.dat.proto')) +var header = messages.Header.decode(headerBytes) +``` + +Now we have all the configuration required to calculate an entry offset. + +``` js +var entryNumber = 42 +var entryOffset = headerEndOffset + entryNumber * (8 + header.hashLength) +``` + +If you have a signed feed, you have to take into account the extra space required for the signatures in the even nodes. + +``` js +var entryOffset = headerLength + entryNumber * (8 + header.hashLength) + + Math.floor(entryNumber / 2) * header.signatureLength +``` |