diff options
author | Max Ogden <max@maxogden.com> | 2016-08-05 12:36:08 -0700 |
---|---|---|
committer | Max Ogden <max@maxogden.com> | 2016-08-05 12:36:08 -0700 |
commit | 416108575f8207d687c52ca079117bcc5cfdb515 (patch) | |
tree | a6b6a2d5624b8edcbb6e4542714785f91e581649 /docs | |
parent | 6879daedad2d53e32d1d9d7bf8b1c74574bc28ef (diff) | |
parent | ffb465bb3bcebb62c5e63cc3ed80d70274848deb (diff) | |
download | dat-docs-416108575f8207d687c52ca079117bcc5cfdb515.tar.gz dat-docs-416108575f8207d687c52ca079117bcc5cfdb515.zip |
Merge branch 'master' of https://github.com/datproject/docs
Diffstat (limited to 'docs')
-rw-r--r-- | docs/api.md | 29 | ||||
-rw-r--r-- | docs/contents.json | 26 | ||||
-rw-r--r-- | docs/diy-dat.md | 41 | ||||
-rw-r--r-- | docs/ecosystem.md | 12 | ||||
-rw-r--r-- | docs/how-dat-works.md | 71 | ||||
-rw-r--r-- | docs/hyperdrive_spec.md | 387 | ||||
-rw-r--r-- | docs/readme.md | 6 | ||||
-rw-r--r-- | docs/sleep.md | 234 | ||||
-rw-r--r-- | docs/welcome.md | 27 |
9 files changed, 833 insertions, 0 deletions
diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 0000000..68a7c3f --- /dev/null +++ b/docs/api.md @@ -0,0 +1,29 @@ +## 1.0 Architecture Design + + + * dat: command-line api + * dat-desk: desktop application + * hyperdrive: storage layer + * discovery-swarm: dat network swarm discovery mechanism + +## dat + +Command-line interface for dat + +#### `dat share DIR` + +Create a new dat link for the contents of the given directory. Prints a URL, which is a unique public key feed. This public key feed can be appended to. + +###### Options + + * `--append=URL`: Adds the new URL to the public key feed. + * `--static`: Ensures that the URL cannot be appended to. + +#### `dat URL DIR` + +Downloads the link to the given directory, and then exits. + +###### Options + + * `--seed`: Downloads the link to the given directory and opens up a server that seeds it to the dat peer network. + * `--list`: Fetches the metadata for the link and prints out the file list in the console. diff --git a/docs/contents.json b/docs/contents.json new file mode 100644 index 0000000..83f5d73 --- /dev/null +++ b/docs/contents.json @@ -0,0 +1,26 @@ +{ + "Introduction": { + "Welcome to Dat": "welcome.md", + "How Dat Works": "how-dat-works.md" + }, + "Specification": { + "hyperdrive spec": "hyperdrive_spec.md", + "sleep": "sleep.md" + }, + "References": { + "API": "api.md", + "DIY Dat": "diy-dat.md" + }, + "Modules": { + "Overview": "ecosystem.md", + "Interface": { + "Dat Command Line": "dat.md", + "dat.land": "dat.land.md", + "Dat Desktop": "dat-desktop.md" + }, + "Core": { + "Hyperdrive": "hyperdrive.md", + "Hypercore": "hypercore.md" + } + } +} diff --git a/docs/diy-dat.md b/docs/diy-dat.md new file mode 100644 index 0000000..fb16fc1 --- /dev/null +++ b/docs/diy-dat.md @@ -0,0 +1,41 @@ +# DIY Dat + +This document shows how to write your own compatible `dat` client using node modules. + +The three essential node modules are called [hyperdrive](https://npmjs.org/hyperdrive), [hyperdrive-archive-swarm](https://npmjs.org/hyperdrive-archive-swarm) and [level](https://npmjs.org/level). Hyperdrive does file synchronization and versioning, hyperdrive-archive-swarm does peer discovery over local networks and the Internet, and level provides a local LevelDB for storing metadata. More details are available in [How Dat Works](how-dat-works.md). The [dat](https://npmjs.org/dat) module itself is just some code that combines these modules and wraps them in a command-line API. + +Here's the minimal code needed to download data from a dat: + +```js +// run this like: node thisfile.js 4c325f7874b4070blahblahetc +// the dat link someone sent us, we want to download the data from it +var link = new Buffer(process.argv[2], 'hex') + +var Hyperdrive = require('hyperdrive') +var Swarm = require('hyperdrive-archive-swarm') +var level = require('level') +var raf = require('random-access-file') +var each = require('stream-each') + +var db = level('./dat.db') +var drive = Hyperdrive(db) +var archive = drive.createArchive(link, { + file: function (name) { + return raf(path.join(self.dir, name)) + } +}) +var swarm = Swarm(archive) + +archive.open(function (err) { + if (err) return console.error(err) + each(archive.list({live: archive.live}), function (data, next) { + var startBytes = self.stats.bytesDown + archive.download(data, function (err) { + if (err) return console.error(err) + console.log('file downloaded', data.relname) + next() + }) + }, done) +}) + +``` diff --git a/docs/ecosystem.md b/docs/ecosystem.md new file mode 100644 index 0000000..673c513 --- /dev/null +++ b/docs/ecosystem.md @@ -0,0 +1,12 @@ +If you want to go deeper and see the implementations we are using in the [Dat command-line tool](https://github.com/maxogden/dat), here you go: + +- [dat](https://www.npmjs.com/package/dat) - the main command line tool that uses all of the below +- [discovery-channel](https://www.npmjs.com/package/discovery-channel) - discover data sources +- [discovery-swarm](https://www.npmjs.com/package/discovery-swarm) - discover and connect to sources +- [hyperdrive](https://www.npmjs.com/package/hyperdrive) - The file sharing network dat uses to distribute files and data. A technical specification / discussion on how hyperdrive works is [available here](https://github.com/mafintosh/hyperdrive/blob/master/SPECIFICATION.md) +- [hypercore](https://www.npmjs.com/package/hypercore) - exchange low-level binary blocks with many sources +- [bittorrent-dht](https://www.npmjs.com/package/bittorrent-dht) - use the Kademlia Mainline DHT to discover sources +- [dns-discovery](https://www.npmjs.com/package/dns-discovery) - use DNS name servers and Multicast DNS to discover sources +- [utp-native](https://www.npmjs.com/package/utp-native) - UTP protocol implementation +- [rabin](https://www.npmjs.com/package/rabin) - Rabin fingerprinter stream +- [merkle-tree-stream](https://www.npmjs.com/package/merkle-tree-stream) - Used to construct Merkle trees from chunks diff --git a/docs/how-dat-works.md b/docs/how-dat-works.md new file mode 100644 index 0000000..c4899af --- /dev/null +++ b/docs/how-dat-works.md @@ -0,0 +1,71 @@ +# How Dat Works + +Note this is about Dat 1.0 and later. For historical info about earlier incarnations of Dat (Alpha, Beta) check out [this post](http://dat-data.com/blog/2016-01-19-brief-history-of-dat). + +When someone starts downloading data with the [Dat command-line tool](https://github.com/maxogden/dat), here's what happens: + +## Phase 1: Source discovery + +Dat links look like this: `dat.land/c3fcbcdcf03360529b47df32ccfb9bc1d7f64aaaa41cca43ca9ac7f6778db8da`. The domain, dat.land, is there so if someone opens the link in a browser we can provide them with download instructions, and as an easy way for people to visually distinguish and remember Dat links. Dat itself doesn't actually use the dat.land part, it just needs the last part of the link which is a fingerprint of the data that is being shared. The first thing that happens when you go to download data using one of these links is you ask various discovery networks if they can tell you where to find sources that have a copy of the data you need. + +Source discovery means finding the IP and port of all the known data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data. By introducing this discovery phase we are able to create a network where data can be discovered even if the original data source disappears. + +The discovery protocols we use are [DNS name servers](https://en.wikipedia.org/wiki/Name_server), [Multicast DNS](https://en.wikipedia.org/wiki/Multicast_DNS) and the [Kademlia Mainline Distributed Hash Table](https://en.wikipedia.org/wiki/Mainline_DHT) (DHT). Each one has pros and cons, so we combine all three to increase the speed and reliability of discovering data sources. + +We run a [custom DNS server](https://www.npmjs.com/package/dns-discovery) that Dat clients use (in addition to specifying their own if they need to), as well as a [DHT bootstrap](https://github.com/bittorrent/bootstrap-dht) server. These discovery servers are the only centralized infrastructure we need for Dat to work over the Internet, but they are redundant, interchangeable, never see the actual data being shared, and anyone can run their own and Dat will still work even if they all go down. If this happens discovery will just be manual (e.g. manually sharing IP/ports). Every data source that has a copy of the data also advertises themselves across these discovery networks. + +The discovery logic itself is handled by a module that we wrote called [discovery-channel](http://npmjs.org/discovery-channel), which wraps other modules we wrote to implement DNS and DHT logic into a single interface. We can give the Dat link we want to download to discovery-channel and we will get back all the sources it finds across the various discovery networks. + +## Phase 2: Source connections + +Up until this point we have just done searches to find who has the data we need. Now that we know who should talk to, we have to connect to them. We use either [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) or [UTP](https://en.wikipedia.org/wiki/Micro_Transport_Protocol) sockets for the actual peer to peer connections. UTP is nice because it is designed to *not* take up all available bandwidth on a network (e.g. so that other people sharing your wifi can still use the Internet). We then layer on our own file sharing protocol on top, called [Hypercore](https://github.com/mafintosh/hypercore). We also are working on WebRTC support so we can incorporate Browser and Electron clients for some really open web use cases. + +When we get the IP and port for a potential source we try to connect using all available protocols (currently TCP and sometimes UTP) and hope one works. If one connects first, we abort the other ones. If none connect, we try again until we decide that source is offline or unavailable to use and we stop trying to connect to them. Sources we are able to connect to go into a list of known good sources, so that if our Internet connection goes down we can use that list to reconnect to our good sources again quickly. + +If we get a lot of potential sources we pick a handful at random to try and connect to and keep the rest around as additional sources to use later in case we decide we need more sources. A lot of these are parameters that we can tune for different scenarios later, but have started with some best guesses as defaults. + +The connection logic is implemented in a module called [discovery-swarm](https://www.npmjs.com/package/discovery-swarm). This builds on discovery-channel and adds connection establishment, management and statistics. You can see stats like how many sources are currently connected, how many good and bad behaving sources you've talked to, and it automatically handles connecting and reconnecting to sources for you. Our UTP support is implemented in the module [utp-native](https://www.npmjs.com/package/utp-native). + +## Phase 3: Data exchange + +So now we have found data sources, have connected to them, but we havent yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. + +The short version of how Hyperdrive works is: It breaks file contents up in to pieces, hashes each piece and then constructs a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) out of all of the pieces. This ultimately gives us the Dat link, which is the top level hash of the Merkle tree. + +Here's the long version: + +Hyperdrive shares and synchronizes a set of files, similar to rsync or Dropbox. For each file in the drive we use a technique called Rabin fingerprinting to break the file up into pieces. Rabin fingerprints are a specific strategy for what is called Content Defined Chunking. Here's an example: + +![cdc diagram](https://raw.githubusercontent.com/datproject/docs/master/assets/cdc.png) + +We have configured our Rabin chunker to produce chunks that are around 16KB on average. So if you share a folder containing a single 1MB JPG you will get around 64 chunks. + +After feeding the file contents through the chunker, we take the chunks and calculate the SHA256 hash of each one. We then arrange these hashes into a special data structure we developed that we call the Flat In-Order Merkle Tree. + +### Flat In-Order Merkle Tree + +``` + 3 + 1 5 +0 2 4 6 +``` + +Want to go lower level? Check out [How Hypercore Works](hyperdrive.md#how-hypercore-works) + +When two peers connect to each other and begin speaking the Hyperdrive protocol they can efficiently determine if they have chunks the other one wants, and begin exchanging those chunks directly. Hyperdrive gives us the flexibility to have random access to any portion of a file while still verifying the other side isnt sending us bad data. We can also download different sections of files in parallel across all of the sources simultaneously, which increases overall download speed dramatically. + +## Phase 4: Data archiving + +So now that you've discovered, connected, and downloaded a copy of some data you can stick around for a while and serve up copies of the data to others who come along and want to download it. + +The first phase, source discovery, is actually an ongoing process. When you first search for data sources you only get the sources available at the time you did your search, so we make sure to perform discovery searches as often is practically possible to make sure new sources can be found and connected to. + +Every user of Dat is a source as long as they have 1 or more chunks of data. Just like with other decentralized file sharing protocols you will notice Dat may start uploading data before it finishes downloading. + +If the original source who shared the data goes offline it's OK, as long as other sources are available. As part of the mission as a not-for-profit we will be working with various institutions to ensure there are always sources available to accept new copies of data and stay online to serve those copies for important datasets such as scientific research data, open government data etc. + +Because Dat is built on a foundation of strong cryptographic data integrity and content addressable storage it gives us the possibility of implementing some really interesting version control techniques in the future. In that scenario archival data sources could choose to offer more disk space and archive every version of a Dat repository, whereas normal Dat users might only download and share one version that they happen to be interested in. + +## Implementations + +This covered a lot of ground. If you want to go deeper and see the implementations we are using in the [Dat command-line tool](https://github.com/maxogden/dat), go to the [Dependencies](ecosystem) page diff --git a/docs/hyperdrive_spec.md b/docs/hyperdrive_spec.md new file mode 100644 index 0000000..64298cd --- /dev/null +++ b/docs/hyperdrive_spec.md @@ -0,0 +1,387 @@ +# Hyperdrive + Hypercore Specification + +## DRAFT Version 1 + +Hyperdrive is the peer-to-peer data distribution protocol that powers Dat. It consists of two parts. First there is hypercore which is the core protocol and swarm that handles distributing append-only logs of any binary data. The second part is hyperdrive which adds a filesystem specific protocol on top of hypercore. + +## Hypercore + +The goal of hypercore is to distribute append-only logs across a network of peers. Peers download parts of the logs from other peers and can choose to only download the parts of a log they care about. Logs can contain arbitrary binary data payloads. + +A core goal is to be as simple and pragmatic as possible. This allows for easier implementations of clients which is an often overlooked property when implementing distributed systems. First class browser support is also an important goal as p2p data sharing in browsers is becoming more viable every day as WebRTC matures. + +It also tries to be modular and export responsibilities to external modules whenever possible. Peer discovery is a good example of this as it handled by 3rd party modules that wasn't written with hyperdrive in mind. A benefit of this is a much smaller core implementation that can focus on smaller and simpler problems. + +Prioritized synchronization of parts of a feed is also at the heart of hyperdrive as this allows for fast streaming with low latency of data such as structured datasets (wikipedia, genomic datasets), linux containers, audio, videos, and much more. To allow for low latency streaming another goal is also to keep verifiable block sizes as small as possible - even with huge data feeds. + +The protocol itself draws heavy inspiration from existing file sharing systems such as BitTorrent and [PPSP](https://datatracker.ietf.org/doc/rfc7574/?include_text=1) + +## How Hypercore works + +### Flat In-Order Trees + +A Flat In-Order Tree is a simple way represent a binary tree as a list. It also allows you to identify every node of a binary tree with a numeric index. Both of these properties makes it useful in distributed applications to simplify wire protocols that uses tree structures. + +Flat trees are described in [PPSP RFC 7574 as "Bin numbers"](https://datatracker.ietf.org/doc/rfc7574/?include_text=1) and a node version is available through the [flat-tree](https://github.com/mafintosh/flat-tree) module. + +A sample flat tree spanning 4 blocks of data looks like this: + +``` +0 + 1 +2 + 3 +4 + 5 +6 +``` + +The even numbered entries represent data blocks (leaf nodes) and odd numbered entries represent parent nodes that have two children. + +The depth of an tree node can be calculated by counting the number of trailing 1s a node has in binary notation. + +``` +5 in binary = 101 (one trailing 1) +3 in binary = 011 (two trailing 1s) +4 in binary = 100 (zero trailing 1s) +``` + +1 is the parent of (0, 2), 5 is the parent of (4, 6), and 3 is the parent of (1, 5). + +If the number of leaf nodes is a power of 2 the flat tree will only have a single root. + +Otherwise it'll have more than one. As an example here is a tree with 6 leafs: + +``` +0 + 1 +2 + 3 +4 + 5 +6 + +8 + 9 +10 +``` + +The roots spanning all the above leafs are 3 an 9. Throughout this document we'll use following tree terminology: + +* `parent` - a node that has two children (odd numbered) +* `leaf` - a node with no children (even numbered) +* `sibling` - the other node with whom a node has a mutual parent +* `uncle` - a parent's sibling + +## Merkle Trees + +A merkle tree is a binary tree where every leaf is a hash of a data block and every parent is the hash of both of its children. + +Merkle trees are useful for ensuring the integrity of content. + +Let's look at an example. Assume we have 4 data blocks, `(a, b, c, d)` and let `h(x)` be a hash function (the hyperdrive stack uses sha256 per default). + +Using flat-tree notation the merkle tree spanning these data blocks looks like this: + +``` +0 = h(a) + 1 = h(0 + 2) +2 = h(b) + 3 = h(1 + 5) +4 = h(c) + 5 = h(4 + 6) +6 = h(d) +``` + +An interesting property of merkle trees is that the node 3 hashes the entire data set. Therefore we only need to trust node 3 to verify all data. However as we learned above there will only be a single root if there is a power of two data blocks. + +Again lets expand our data set to contain 6 items `(a, b, c, d, e, f)`: + +``` +0 = h(a) + 1 = h(0 + 2) +2 = h(b) + 3 = h(1 + 5) +4 = h(c) + 5 = h(4 + 6) +6 = h(d) + +8 = h(e) + 9 = h(8 + 10) +10 = h(f) +``` + +To ensure always have only a single root we'll simply hash all the roots together again. At most there will be `log2(number of data blocks)`. + +In addition to hashing the roots we'll also include a bin endian uint64 binary representation of the corresponding node index. + +Using the two above examples the final hashes would be: + +``` +hash1 = h(uint64be(#3) + 3) +hash2 = h(uint64be(#9) + 9 + uint64be(#3) + 3) +``` + +Each of these hashes can be used to fully verify each of the trees. Let's look at another example. Assume we trust `hash1` and another person wants to send block `0` to us. To verify block `0` the other person would also have to send the sibling hash and uncles until it reaches a root and the other missing root hashes. For the first tree that would mean hashes `(2, 5)`. + +Using these hashes we can reproduce `hash1` in the following way: + +``` +0 = h(block received) + 1 = h(0 + 2) +2 = (hash received) + 3 = h(1 + 5) + 5 = (hash received) +``` + +If `h(uint64be(#3) + 3) == hash1` then we know that data we received from the other person is correct. They sent us `a` and the corresponding hashes. + +Since we only need uncle hashes to verify the block the amount of hashes we need is at worst `log2(number-of-blocks)` and the roots of the merkle trees which has the same complexity. + +A merkle tree generator is available on npm through the [merkle-tree-stream](https://github.com/mafintosh/merkle-tree-stream) module. + +## Merkle Tree Deduplication + +Merkle trees have another great property. They make it easy to deduplicate content that is similar. + +Assume we have two similar datasets: + +``` +(a, b, c, d, e) +(a, b, c, d, f) +``` + +These two datasets are the same except their last element is different. When generating merkle trees for the two data sets you'd get two different root hashes out. + +However if we look a the flat-tree notation for the two trees: + +``` +0 + 1 +2 + 3 +4 + 5 +6 + +8 +``` + +We'll notice that the hash stored at 3 will be the same for both trees since the first four blocks are the same. Since we also send uncle hashes when sending a block of data we'll receive the hash for 3 when we request any block. If we maintain a simple index that maps a hash into the range of data it covers we can detect that we already have the data spanning 3 and we won't need to re-download that from another person. + +``` +1 -> (a, b) +3 -> (a, b, c, d) +5 -> (c, d) +``` + +This means that two datasets share a similar sequence of data the merkle tree helps you detect that. + +## Signed Merkle Trees + +As described above the top hash of a merkle tree is the hash of all its content. This has both advantages and disadvanteges. + +An advantage is that you can always reproduce a merkle tree simply by having the data contents of a merkle tree. + +A disadvantage is every time you add content to your data set your merkle tree hash changes and you'll need to re-distribute the new hash. + +Using a bit of cryptography however we can make our merkle tree appendable. First generate a cryptographic key pair that can be used to sign data using [ed25519](https://ed25519.cr.yp.to/) keys, as they are compact in size (32 byte public keys). A key pair (public key, secret key) can be used to sign data. Signing data means that if you trust a public key and you receive data and a signature for that data you can verify that a signature was generated with the corresponding secret key. + +How does this relate to merkle trees? Instead of distributing the hash of a merkle tree we can distribute our public key instead. We then use our secret key to continously sign the merkle trees of our data set every time we append to it. + +Assume we have a data set with only a single item in it `(a)` and a key pair `(secret, public)`: + +``` +(a) +``` + +We generate a merkle tree for this data set which will have the roots `0` and sign the hash of these roots (see the merkle tree section) with our secret key. + +If we want to send `a` to another person and they trust our public key we simply send `a` and the uncles needed to generate the roots plus our signature. + +If we append a new item to our data set we simply do the same thing: + +``` +(a, b) +``` + +Notice that all new signatures verify the entire dataset since they all sign a merkle tree that spans all data. This serves two purposes. First of all it makes sure that the dataset publisher cannot change old data. It also ensures that the publisher cannot share different versions of the same dataset to different persons without the other people noticing it (at some point they'll get a signature for the same node index that has different hashes if they talk to multiple people). + +This technique has the added benefit that you can always convert a signed merkle tree to a normal unsigned one if you wish (or turn an unsigned tree into a signed tree). + +In general you should send as wide as possible signed tree back when using signed merkle trees as that lowers the amount of signatures the other person needs to verify which has a positive performance impact for some platforms. It will also allow other users to more quickly detect if a tree has duplicated content. + +## Block Tree Digest + +When asking for a block of data we want to reduce the amount of duplicate hashes that are sent back. + +In the merkle tree example for from earlier we ended up sending two hashes `(2, 5)` to verify block `0`. + +``` +// If we trust 3 then 2 and 5 are needed to verify 0 + +0 + 1 +2 + 3 +4 + 5 +6 +``` + +Now if we ask for block `1` afterwards (`2` in flat tree notation) the other person doesn't need to send us any new hashes since we already received the hash for `2` when fetching block `0`. + +If we only use non-signed merkle trees the other person can easily calculate which hashes we already have if we tell them which blocks we've got. + +This however isn't always possible if we use a signed merkle tree since the roots are changing. In general it also useful to be able to communicate that you have some hashes already without disclosing all the blocks you have. + +To communicate which hashes we have just have to communicate two things: which uncles we have and whether or not we have any parent node that can verify the tree. + +Looking at the above tree that means if we want to fetch block `0` we need to communicate whether of not we already have the uncles `(2, 5)` and the parent `3`. This information can be compressed into very small bit vector using the following scheme. + +Let the trailing bit donate whether or not the leading bit is a parent and not a uncle. Let the previous trailing bits denote wheather or not we have the next uncle. + +For example for block `0` the following bit vector `1011` is decoded the following way + +``` +// for block 0 + +101(1) <-- tell us that the last bit is a parent and not an uncle +10(1)1 <-- we already have the first uncle, 2 so don't send us that +1(0)11 <-- we don't have the next uncle, 5 +(1)000 <-- the final bit so this is parent. we have the next parent, 3 +``` + +So using this digest the person can easily figure out that they only need to send us one hash, `5`, for us to verify block `0`. + +The bit vector `1` (only contains a single one) means that we already have all the hashes we need so just send us the block. + +These digests are very compact in size, only `(log2(number-of-blocks) + 2) / 8` bytes needed in the worst case. For example if you are sharing one trillion blocks of data the digest would be `(log2(1000000000000) + 2) / 8 ~= 6` bytes long. + +### Bitfield Run length Encoding + +(talk about rle) + +### Basic Privacy + +(talk about the privacy features + discovery key here) + +## Hypercore Feeds + +(talk about how we use the above concepts to create a feed of data) + +## Hypercore Replication Protocol + +Lets assume two peers have the identifier for a hypercore feed. This could either be the hash of the merkle tree roots described above or a public key if they want to share a signed merkle tree. The two peers wants to exchange the data verified by this tree. Lets assume the two peers have somehow connected to each other. + +Hypercore uses a message based protocol to exchange data. All messages sent are encoded to binary values using Protocol Buffers. Protocol Buffers are a widely supported schema based encoding support. A Protocol Buffers implementation is available on npm through the [protocol-buffers](https://github.com/mafintosh/protocol-buffers) module. + +These are the types of messages the peers send to each other + +#### Open + +This should be the first message sent and is also the only message without a type. It looks like this + +``` protobuf +message Open { + required bytes feed = 1; + required bytes nonce = 2; +} +``` + +The `feed` should be set to the discovery key of the Merkle Tree as specified above. The `nonce` should be set to 24 bytes of high entropy random data. When running in encrypted mode this is the only message sent unencrypted. + +When you are done using a channel send an empty message to indicate end-of-channel. + +#### `0` Handshake + +The message contains the protocol handshake. It has type `0`. + +``` protobuf +message Handshake { + required bytes id = 1; + repeated string extensions = 2; +} +``` + +You should send this message after sending an open message. By sending it after an open message it will be encrypted and we wont expose our peer id to a third party. The current protocol version is 0. + +#### `1` Have + +You can send a have message to give the other peer information about which blocks of data you have. It has type `1`. + +``` protobuf +message Have { + required uint64 start = 1; + optional uint64 end = 2; + optional bytes bitfield = 3; +} +``` + +If using a bitfield it should be encoded using a run length encoding described above. It is a good idea to send a have message soon as possible if you have blocks to share to reduce latency. + +#### `2` Want + +You can send a have message to give the other peer information about which blocks of data you want to have. It has type `2`. + +``` protobuf +message Want { + required uint64 start = 1; + optional uint64 end = 2; +} +``` + +You should only send the want message if you are interested in a section of the feed that the other peer has not told you about. + +#### `3` Request + +Send this message to request a block of data. You can request a block by block index or byte offset. If you are only interested +in the hash of a block you can set the hash property to true. The nodes property can be set to a tree digest of the tree nodes you already +have for this block or byte range. A request message has type `3`. + +``` protobuf +message Request { + optional uint64 block = 1; + optional uint64 bytes = 2; + optional bool hash = 3; + optional uint64 nodes = 4; +} +``` + +#### `4` Data + +Send a block of data to the other peer. You can use this message to reply to a request or optimistically send other blocks of data to the other client. It has type `4`. + +``` protobuf +message Data { + message Node { + required uint64 index = 1; + required uint64 size = 2; + required bytes hash = 3; + } + + required uint64 block = 1; + optional bytes value = 2; + repeated Node nodes = 3; + optional bytes signature = 4; +} +```` + +#### `5` Cancel + +Cancel a previous sent request. It has type `5`. + +``` protobuf +message Cancel { + optional uint64 block = 1; + optional uint64 bytes = 2; +} +``` + +#### `6` Pause + +An empty message that tells the other peer that they should stop requesting new blocks of data. It has type `6`. + +#### `7` Resume + +An empty message that tells the other peer that they can continue requesting new blocks of data. It has type `7`. diff --git a/docs/readme.md b/docs/readme.md new file mode 100644 index 0000000..6597fc3 --- /dev/null +++ b/docs/readme.md @@ -0,0 +1,6 @@ +## Writing Documentation + +1. Add file to this folder +2. Write docs +3. Add to table of contents in `contents.json` +4. Build & Deploy (see main readme) diff --git a/docs/sleep.md b/docs/sleep.md new file mode 100644 index 0000000..08811e9 --- /dev/null +++ b/docs/sleep.md @@ -0,0 +1,234 @@ +# SLEEP Data Format + +### Syncable Lightweight Event Emitting Persistence +### Version 2.0 + +SLEEP is a metadata format that allows a set of files to be accessed randomly, cryptographically verified, and dynamically updated. A SLEEP file contains content addressed file metadata in a representation specifically designed to allow partial streaming access to individual chunks of data. SLEEP files can be shared as a single downloadable file for easy distribution and we also specify a way to expose SLEEP over REST. + +The SLEEP format can be used in a similar way to how MD5 checksums are used over HTTP today, to verify the integrity of data downloads. Whereas MD5 or SHA are usually checksums of the whole data set, meaning consumers have to download the entire all available data before they are able to verify the integrity of any of it, SLEEP allows a set of data to be split in to many small pieces, each one getting it's own cryptographically secure checksum. This allows consumers to download subsets metadata and data, in whatever order they prefer, but allowing them to verify the integrity of each piece of data as it is accessed. It also includes cryptographic signatures allowing users to verify that data they received was created using a holder of a specific private key. + +## Registers + +SLEEP is designed around the concept of a register, an append only list that you can trust. The contents of a register are cryptographically fingerprinted and an aggregate checksum can be used to verify the contents of the register have not been tampered with. There are various ways to calculate these aggregate checksums but the data in a register is a binary append only feed, e.g. an list of buffers that can only be updated by placing new buffers at the end of the list. + +SLEEP also provides an index that allows each piece of data in a register to be accessed randomly. In order to look up a specific piece of data in the register, you only need a small subset of the metadata in order to find it, making SLEEP suitable for live streaming or sparse download use cases. + +The register index is a Merkle tree where the leaf nodes are the hashes of the buffers in the register, and the rest of the nodes in the tree are derived Merkle hashes. A Merkle tree is defined as a tree where leaf nodes are a hash of some piece of data, and the rest of the nodes are the result of a hash of the concatenation of that nodes children. + +So, given a register with four values: + +``` +1. a +2. b +3. c +4. d +``` + +To construct the register itself you concatenate all buffers, in this case resulting in 'abcd'. + +The register index is constructed by creating a Merkle tree where the leaf nodes are the hash of our four values, and the rest of the nodes are the hash of the nodes two children hashes concatenated together. + +``` +hash(a) + > hash(hash(a) + hash(b)) +hash(b) + > hash(hash(hash(a) + hash(b)) + hash(hash(c) + hash(d))) +hash(c) + > hash(hash(c) + hash(d)) +hash(d) +``` + +To be able to refer to a specific node in the tree we use an in-order node traversal to assign integers to the nodes: + +``` +0 + 1 +2 + 3 +4 + 5 +6 +``` + +In-order node numbering has the property with our trees that leaf nodes are always even and non-leaf nodes are always odd. This can be used as a quick way to identify whether a node is a leaf or not. + +Every serialized node in the tree is one of two fixed widths, leaf nodes are all the same size and non-leaf nodes are the same size. When serializing the tree you simply write the nodes in order and concatenate them. Then to access a node by its in-order position you simply multiply the node length by the position to get the byte offset. + +All leaf nodes contain these two pieces of information: + +- The sha256 hash of the data described by this node +- The absolute byte offset to the end of the region of data described by the node + +All non-leaf nodes contain these three pieces of information: + +- The sha256 hash of the concatenation of the two children hashes +- The cryptographic signature of the hash +- The span of bytes that the the nodes children cover + +When initializing a register an asymmetric Ed25519 keypair is derived. The private key is never shared. The public key is used as the URL for the register. When signing hashes in the tree the public key is used to generate an EdDSA signature. For the example register above, 'abcd', the register index (in JSON) would be: + +```js +var keys = { + public: 'cc0cf6eeb82ca946ca60265ce0863fb2b3e3075ae25cba14d162ef20e3f9f223', + private: '87399f90815db81e687efe4fd9fc60af336f4d9ae560fda106f94cb7a92a8804cc0cf6eeb82ca946ca60265ce0863fb2b3e3075ae25cba14d162ef20e3f9f223' +} + +var index = { + // sha256 of children[0].hash + children[1].hash + hash: '0440c655d63fec5c02cffd5d9b42d146aca03b255102b9b44b51c6a919b31351', + signature: '1713dfbaf4a7f288003394b72ec486aa4fa1a837aa0b08662b3a14b63381b84c2e6965e2638fb5375ae2e92b47c2ab8718ec1914778518fcb3c0563eb2c09604', + span: 4, + children: [ + { + // echo -n "$(echo -n "a" | shasum -a 256)$(echo -n "b" | shasum -a 256)" | shasum -a 256 + hash: '9ad4d5608a7a40db60c35f255fad821b762a82de168b4f4ed477d5d899b11796', + signature: '2714b99e305ce46aa6d24eb2888cf0cbde33ad4a8bcd08705b59882837bf1e482f8dcab2ae94c2359914b1fe92831bfc73af99f1c6b1f5eba47efc4efa32de0d', + span: 2, + children: [ + { + // echo -n "a" | shasum -a 256 + hash: 'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb', + endByte: 1 + }, + { + // echo -n "b" | shasum -a 256 + hash: '3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d', + endByte: 2 + } + ] + }, + { + // echo -n "$(echo -n "c" | shasum -a 256)$(echo -n "d" | shasum -a 256)" | shasum -a 256 + hash: '09114d1a8a78b5d091e492c524ad7f8e941f403db0a6d3d52d36f17b9a86ce1c', + signature: '6ac5e25206f69f22612e9b58c14f9ae6738233a57ab7f6e10c1384c4e074f6c8c606edbd95a9c099a0120947866079e3d13ef66dd7d5ed1756a89a5e9032a20d', + span: 2, + children: [ + { + // echo -n "c" | shasum -a 256 + hash: '2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6', + endByte: 3 + }, + { + // echo -n "d" | shasum -a 256 + hash: '18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4', + endByte: 4 + } + ] + } + ] +} +``` + +The above representation of the tree is in JSON. However due to the properties of the in-order node indexes we can represent the same data in a flat index while still allowing traversals. + +# File format + +SLEEP files should be named `sleep.dat` and have the following format: + +``` +<Header><Register Index...><Register Data...> +``` + +The format is a header followed by the register index. Order of the index is determined by an in-order node traversal. After the register index, the actual register entry data follows. The header length is variable width, prefixed with a varint. The Register Index is composed of fixed width metadata entries. The Register Data is composed of concatenated non-fixed width data pieces. + +### Header format + +``` +<varint header-length><header protobuf> +``` + +The header protobuf has this schema: + +``` protobuf +message Header { + required bytes datLink = 1; + required uint64 entryCount = 2; + optional bool isSigned = 3; + optional string hashType = 4 [default = "sha256"]; + optional uint32 hashLength = 5 [default = 32]; + optional string signatureType = 6 [default = "ed25519"]; + optional uint32 signatureLength = 7 [default = 64]; +} +``` + +### Register Index format + +For non-signed even (leaf) nodes: + +``` +<8-byte-span-length><data-hash> +``` + +The 8-byte-span-length is an unsigned big endian 64 bit integer that should be number of cumulative bytes encompassed by all of the leaf nodes underneath the current node. + +For signed even (leaf) nodes: + +``` +<8-byte-span-length><data-hash-signature><data-hash> +``` + +For odd (non-leaf) nodes: + +``` +<8-byte-end-offset><data-hash> +``` + +The 8-byte-end-offset is an unsigned big endian 64 bit integer that should be the absolute position in the file for the **end** of the piece data described by this node. + +### Register Data + +The last section of the file is the actual data pieces, unmodified and concatenated together in sequential order. + +For the example tree above, the Register Data section would simply be `abcd`. + +## Example + +Given a tree like this you might want to look up in a `meta.dat` file the metadata for a specific node: + +``` +0─┐ + 1─┐ +2─┘ │ + 3 +4─┐ │ + 5─┘ +6─┘ +``` + +If you wanted to look up the metadata for 3, you could read the third (or any!) entry from sleep.dat: + +First you have to read the varint at the beginning of the file so you know how big the header is: + +``` js +var varint = require('varint') // https://github.com/chrisdickinson/varint +var headerLength = varint.decode(firstChunkOfFile) +``` + +Now you can read the header from the file + +``` js +var headerOffset = varint.encodingLength(headerLength) +var headerEndOffset = headerOffset + headerLength +var headerBytes = firstChunkOfFile.slice(headerOffset, headerEndOffset) +``` + +To decode the header use the protobuf schema. We can use the [protocol-buffers](https://github.com/mafintosh/protocol-buffers) module to do that. + +``` js +var messages = require('protocol-buffers')(fs.readFileSync('meta.dat.proto')) +var header = messages.Header.decode(headerBytes) +``` + +Now we have all the configuration required to calculate an entry offset. + +``` js +var entryNumber = 42 +var entryOffset = headerEndOffset + entryNumber * (8 + header.hashLength) +``` + +If you have a signed feed, you have to take into account the extra space required for the signatures in the even nodes. + +``` js +var entryOffset = headerLength + entryNumber * (8 + header.hashLength) + + Math.floor(entryNumber / 2) * header.signatureLength +``` diff --git a/docs/welcome.md b/docs/welcome.md new file mode 100644 index 0000000..51d7e5e --- /dev/null +++ b/docs/welcome.md @@ -0,0 +1,27 @@ +# dat + +Dat is a decentralized data tool for distributing data small and large. + +[![#dat IRC channel on freenode](https://img.shields.io/badge/irc%20channel-%23dat%20on%20freenode-blue.svg)](http://webchat.freenode.net/?channels=dat) +[![datproject/discussions](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/datproject/discussions?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) +[![docs](https://img.shields.io/badge/Dat%20Project-Docs-green.svg)](http://docs.dat-data.com) + +## About Dat + +Documentation for the Dat project is available at [docs.dat-data.com](http://docs.dat-data.com). + +### Key features: + + * **Live sync** folders by sharing files as they are added to the folder. + * **Distribute large files** without copying data to a central server by connecting directly to peers. + * **Intelligently sync** by deduplicating data between versions. + * **Verify data integrity** using strong cryptographic hashes. + * **Work everywhere**, including in the [browser](https://github.com/datproject/dat.land) and on the [desktop](https://github.com/juliangruber/dat-desktop). + +Dat embraces the Unix philosophy: a modular design with composable parts. All of the pieces can be replaced with alternative implementations as long as they implement the abstract API. + +### Ways to Use Dat + + * [Dat CLI](https://github.com/maxogden/dat): command line tool + * [Dat Desktop](https://github.com/juliangruber/dat-desktop/): desktop application + * [dat.land](https://github.com/datproject/dat.land): website application
\ No newline at end of file |