From 20e67d4c4ffed610481b80ff559dcf7326e1771c Mon Sep 17 00:00:00 2001 From: Max Ogden Date: Mon, 4 Apr 2016 15:19:28 -0700 Subject: update how-dat-works --- how-dat-works.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'how-dat-works.md') diff --git a/how-dat-works.md b/how-dat-works.md index 8f46cdb..2c4f849 100644 --- a/how-dat-works.md +++ b/how-dat-works.md @@ -8,7 +8,7 @@ When someone starts downloading data with the [Dat command-line tool](https://gi Dat links look like this: `dat.land/c3fcbcdcf03360529b47df32ccfb9bc1d7f64aaaa41cca43ca9ac7f6778db8da`. The domain, dat.land, is there so if someone opens the link in a browser we can provide them with download instructions, and as an easy way for people to visually distinguish and remember Dat links. Dat itself doesn't actually use the dat.land part, it just needs the last part of the link which is a fingerprint of the data that is being shared. The first thing that happens when you go to download data using one of these links is you ask various discovery networks if they can tell you where to find sources that have a copy of the data you need. -The discovery step itself is a simple query: you supply a Dat link, and receive back the IP and port of all the known data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data. By introducing this discovery phase we are able to create a network where data can be discovered even if the original data source disappears. +Source discovery means finding the IP and port of all the known data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data. By introducing this discovery phase we are able to create a network where data can be discovered even if the original data source disappears. The discovery protocols we use are [DNS name servers](https://en.wikipedia.org/wiki/Name_server), [Multicast DNS](https://en.wikipedia.org/wiki/Multicast_DNS) and the [Kademlia Mainline Distributed Hash Table](https://en.wikipedia.org/wiki/Mainline_DHT) (DHT). Each one has pros and cons, so we combine all three to increase the speed and reliability of discovering data sources. @@ -18,17 +18,17 @@ The discovery logic itself is handled by a module that we wrote called [discover ## Phase 2: Source connections -Up until this point we have just done searches to find who has the data we need. Now that we know who should talk to, we have to connect to them. We currently use TCP sockets are our primary transport protocol, and layer on our own file sharing protocol on top. We also have experimental support for [UTP](https://en.wikipedia.org/wiki/Micro_Transport_Protocol) which is designed to *not* take up all available bandwidth on a network (e.g. so that other people sharing your wifi can still use the Internet). We also are working on WebRTC support so we can incorporate Browser and Electron clients for some really open web use cases. +Up until this point we have just done searches to find who has the data we need. Now that we know who should talk to, we have to connect to them. We use either [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) or [UTP](https://en.wikipedia.org/wiki/Micro_Transport_Protocol) sockets for the actual peer to peer connections. UTP is nice because it is designed to *not* take up all available bandwidth on a network (e.g. so that other people sharing your wifi can still use the Internet). We then layer on our own file sharing protocol on top, called [Hypercore](https://github.com/mafintosh/hypercore). We also are working on WebRTC support so we can incorporate Browser and Electron clients for some really open web use cases. When we get the IP and port for a potential source we try to connect using all available protocols (currently TCP and sometimes UTP) and hope one works. If one connects first, we abort the other ones. If none connect, we try again until we decide that source is offline or unavailable to use and we stop trying to connect to them. Sources we are able to connect to go into a list of known good source, so that if our Internet connection goes down we can use that list to reconnect to our good sources again quickly. If we get a lot of potential sources we pick a handful at random to try and connect to and keep the rest around as additional sources to use later in case we decide we need more sources. A lot of these are parameters that we can tune for different scenarios later, but have started with some best guesses as defaults. -The connection logic is implemented in a module called [discovery-swarm](https://www.npmjs.com/package/discovery-swarm). This builds on discovery-channel and adds connection establishment, management and statistics. You can see stats like how many sources are currently connected, how many good and bad behaving sources you've talked to, and it automatically handles connecting and reconnecting to sources for you. Our experimental UTP support is implemented in the module [utp-native](https://www.npmjs.com/package/utp-native), which you can manually install if you want to try it out with Dat. +The connection logic is implemented in a module called [discovery-swarm](https://www.npmjs.com/package/discovery-swarm). This builds on discovery-channel and adds connection establishment, management and statistics. You can see stats like how many sources are currently connected, how many good and bad behaving sources you've talked to, and it automatically handles connecting and reconnecting to sources for you. Our UTP support is implemented in the module [utp-native](https://www.npmjs.com/package/utp-native). ## Phase 3: Data exchange -So now we have found data sources, have connected to them, but we havent yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. You can read a much longer description of how hyperdrive works in the [Hyperdrive Specification](https://github.com/mafintosh/hyperdrive/blob/master/SPECIFICATION.md). +So now we have found data sources, have connected to them, but we havent yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. You can read a much longer description of how hyperdrive works in the [Hyperdrive Specification](hyperdrive.md). The short version of how Hyperdrive works is that it breaks file contents up in to pieces, hashes each piece and then constructs a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) out of all of the pieces. This ultimately gives us the Dat link, which is the top level hash of the Merkle tree. -- cgit v1.2.3 From 35d4251dead3335525244c237af81b7aec67f0b2 Mon Sep 17 00:00:00 2001 From: Max Ogden Date: Tue, 5 Apr 2016 12:48:33 -0700 Subject: test githubs gpg signed commits --- how-dat-works.md | 22 +++++++++++++++++++--- hyperdrive.md | 4 ++-- 2 files changed, 21 insertions(+), 5 deletions(-) (limited to 'how-dat-works.md') diff --git a/how-dat-works.md b/how-dat-works.md index 2c4f849..71dba53 100644 --- a/how-dat-works.md +++ b/how-dat-works.md @@ -28,14 +28,30 @@ The connection logic is implemented in a module called [discovery-swarm](https:/ ## Phase 3: Data exchange -So now we have found data sources, have connected to them, but we havent yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. You can read a much longer description of how hyperdrive works in the [Hyperdrive Specification](hyperdrive.md). +So now we have found data sources, have connected to them, but we havent yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. -The short version of how Hyperdrive works is that it breaks file contents up in to pieces, hashes each piece and then constructs a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) out of all of the pieces. This ultimately gives us the Dat link, which is the top level hash of the Merkle tree. +The short version of how Hyperdrive works is: It breaks file contents up in to pieces, hashes each piece and then constructs a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) out of all of the pieces. This ultimately gives us the Dat link, which is the top level hash of the Merkle tree. -We use a technique called Rabin fingerprinting to break files up into pieces. Rabin fingerprints are a specific strategy for what is called Content Defined Chunking. Here's an example: +Here's the long version: + +Hyperdrive shares and synchronizes a set of files, similar to rsync or Dropbox. For each file in the drive we use a technique called Rabin fingerprinting to break the file up into pieces. Rabin fingerprints are a specific strategy for what is called Content Defined Chunking. Here's an example: ![cdc diagram](meta/cdc.png) +We have configured our Rabin chunker to produce chunks that are around 16KB on average. So if you share a folder containing a single 1MB JPG you will get around 64 chunks. The idea of CDC is that your chunks only change if + +After feeding the file contents through the chunker, we take the chunks and calculate the SHA256 hash of each one. We then arrange these hashes into a special data structure we developed that we call the Flat In-Order Merkle Tree. + +### Flat In-Order Merkle Tree + +``` + 3 + 1 5 +0 2 4 6 +``` + +Want to go lower level? Check out [How Hypercore Works?](hyperdrive.md#how-hypercore-works) + When two peers connect to each other and begin speaking the Hyperdrive protocol they can efficiently determine if they have chunks the other one wants, and begin exchanging those chunks directly. Hyperdrive gives us the flexibility to have random access to any portion of a file while still verifying the other side isnt sending us bad data. We can also download different sections of files in parallel across all of the sources simultaneously, which increases overall download speed dramatically. ## Phase 4: Data archiving diff --git a/hyperdrive.md b/hyperdrive.md index 999a03c..1636d77 100644 --- a/hyperdrive.md +++ b/hyperdrive.md @@ -18,9 +18,9 @@ The protocol itself draws heavy inspiration from existing file sharing systems s ## How Hypercore works -### Flat Trees +### Flat In-Order Trees -A flat tree is a simple way represent a binary tree as a list. It also allows you to identify every node of a binary tree with a numeric index. Both of these properties makes it useful in distributed applications to simplify wire protocols that uses tree structures. +A Flat In-Order Tree is a simple way represent a binary tree as a list. It also allows you to identify every node of a binary tree with a numeric index. Both of these properties makes it useful in distributed applications to simplify wire protocols that uses tree structures. Flat trees are described in [PPSP RFC 7574 as "Bin numbers"](https://datatracker.ietf.org/doc/rfc7574/?include_text=1) and a node version is available through the [flat-tree](https://github.com/mafintosh/flat-tree) module. -- cgit v1.2.3 From 2d12228213baa9af81f4e4b21c59d2597b9a39d8 Mon Sep 17 00:00:00 2001 From: Max Ogden Date: Tue, 5 Apr 2016 12:49:23 -0700 Subject: try again --- how-dat-works.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'how-dat-works.md') diff --git a/how-dat-works.md b/how-dat-works.md index 71dba53..0e76667 100644 --- a/how-dat-works.md +++ b/how-dat-works.md @@ -38,7 +38,7 @@ Hyperdrive shares and synchronizes a set of files, similar to rsync or Dropbox. ![cdc diagram](meta/cdc.png) -We have configured our Rabin chunker to produce chunks that are around 16KB on average. So if you share a folder containing a single 1MB JPG you will get around 64 chunks. The idea of CDC is that your chunks only change if +We have configured our Rabin chunker to produce chunks that are around 16KB on average. So if you share a folder containing a single 1MB JPG you will get around 64 chunks. After feeding the file contents through the chunker, we take the chunks and calculate the SHA256 hash of each one. We then arrange these hashes into a special data structure we developed that we call the Flat In-Order Merkle Tree. -- cgit v1.2.3 From 22c75baea185200cb0032adc76f0f33448a3f2aa Mon Sep 17 00:00:00 2001 From: Max Ogden Date: Sun, 8 May 2016 13:53:37 +0200 Subject: more sections of paper --- how-dat-works.md | 2 +- paper.md | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 65 insertions(+), 8 deletions(-) (limited to 'how-dat-works.md') diff --git a/how-dat-works.md b/how-dat-works.md index 0e76667..9921f62 100644 --- a/how-dat-works.md +++ b/how-dat-works.md @@ -20,7 +20,7 @@ The discovery logic itself is handled by a module that we wrote called [discover Up until this point we have just done searches to find who has the data we need. Now that we know who should talk to, we have to connect to them. We use either [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) or [UTP](https://en.wikipedia.org/wiki/Micro_Transport_Protocol) sockets for the actual peer to peer connections. UTP is nice because it is designed to *not* take up all available bandwidth on a network (e.g. so that other people sharing your wifi can still use the Internet). We then layer on our own file sharing protocol on top, called [Hypercore](https://github.com/mafintosh/hypercore). We also are working on WebRTC support so we can incorporate Browser and Electron clients for some really open web use cases. -When we get the IP and port for a potential source we try to connect using all available protocols (currently TCP and sometimes UTP) and hope one works. If one connects first, we abort the other ones. If none connect, we try again until we decide that source is offline or unavailable to use and we stop trying to connect to them. Sources we are able to connect to go into a list of known good source, so that if our Internet connection goes down we can use that list to reconnect to our good sources again quickly. +When we get the IP and port for a potential source we try to connect using all available protocols (currently TCP and sometimes UTP) and hope one works. If one connects first, we abort the other ones. If none connect, we try again until we decide that source is offline or unavailable to use and we stop trying to connect to them. Sources we are able to connect to go into a list of known good sources, so that if our Internet connection goes down we can use that list to reconnect to our good sources again quickly. If we get a lot of potential sources we pick a handful at random to try and connect to and keep the rest around as additional sources to use later in case we decide we need more sources. A lot of these are parameters that we can tune for different scenarios later, but have started with some best guesses as defaults. diff --git a/paper.md b/paper.md index 9365b09..455bfc2 100644 --- a/paper.md +++ b/paper.md @@ -9,7 +9,7 @@ max@maxogden.com ## ABSTRACT -Dat is a swarm based version control system designed for sharing large datasets over networks such that their contents can be accessed randomly, be updated incrementally, and have the integrity of their contents be trusted. Every Dat user is simultaneously a server and a client exchanging pieces of data with other peers in a swarm on demand. As data is added to a Dat repository updated files are split into pieces based on Rabin fingerprinting and deduplicated against known pieces to avoid retransmission of data. File contents are automatically verified using secure hashes meaning you do not need to trust other nodes. +Dat is a swarm based version control system designed for sharing large datasets over networks such that their contents can be accessed randomly, be updated incrementally, and have the integrity of their contents be trusted. Every Dat user is simultaneously a server and a client exchanging pieces of data with other peers in a swarm on demand. As data is added to a Dat repository updated files are split into pieces based on Rabin fingerprinting and deduplicated against known pieces to avoid retransmission of data. File contents are automatically verified using secure hashes meaning you do not need to trust other nodes. ## 1. INTRODUCTION @@ -41,7 +41,7 @@ Content defined chunking has the benefit of being shift resistant, meaning if yo BitTorrent implements a swarm based file sharing protocol for static datasets. Data is split into fixed sized chunks, hashed, and then that hash is used to discover peers that have the same data. An advantage of using BitTorrent for dataset transfers is that download speeds can be saturated. Since the file is split into pieces, and peers can efficiently discover which pieces each of the peers they are connected to have, it means one peer can download non-overlapping regions of the dataset from many peers at the same time in parallel, maximizing network throughput. -Fixed sized chunking has drawbacks for data that changes (see LBFS above). Additionally, BitTorrent always divides data into 1024 pieces, meaning large datasets could have a very large chunk size which impacts random access performance (e.g. for streaming video over BitTorrent). +Fixed sized chunking has drawbacks for data that changes (see LBFS above). Additionally, BitTorrent assumes all metadata will be transferred up front, and most clients divide data into 1024 pieces, meaning large datasets could have a very large chunk size which impacts random access performance (e.g. for streaming video over BitTorrent). ### Kademlia Distributed Hash Table @@ -69,11 +69,68 @@ WebTorrent implements the BitTorrent protocol in JavaScript using WebRTC as the IPFS also builds on many of the concepts from this section and presents a new platform similar in scope to the Web that has content integrity, peer to peer file sharing, version history and data permanence baked in as a sort of upgrade to the current Web. Whereas Dat is one application of these ideas that is specifically focused on sharing datasets but is agnostic to what platform it is built on, IPFS goes lower level and abstracts network sockets and naming systems so that any application built on the Web can alternatively be built on IPFS to inherit it's properties, as long as their hyperlinks can be expressed as content addressed addresses to the IPFS global Merkle DAG. -The research behind IPFS has coalesced many of these ideas into a more accessible format. +The research behind IPFS has coalesced many of these ideas into a more accessible format. We are still exploring how to best implement the Dat protocol on top of the IPFS platform. ## 3. DESIGN -- mirroring -- reproducibility -- parallel downloading -- incremental updates +Dat is a file sharing protocol that does not assume a dataset is static or that the entire dataset will be downloaded. The protocol is agnostic to the underlying transport, e.g. you could implement Dat over carrier pigeon. The key properties of the Dat design are explained in this section. + +- 1. **Mirroring** - All participants in the network simultaneously share and consume data. +- 2. **Content Integrity** - Data and publisher integrity is verified through use of signed content addressable hashes +- 3. **Parallel transfer** - Subsets of the data can be accessed from multiple peers simultaneously, improving transfer speeds +- 4. **Streaming updates** - Datasets can be updated and distributed in real time to downstream peers +- 5. **Secure Metadata** - Dat employs a capability system whereby anyone with a Dat link can connect to the swarm, but the link itself is a secure hash that is nearly impossible to guess + +## 3.1 Mirroring + +Dat is a peer to peer protocol designed to exchange pieces of a dataset amongst a swarm of peers. When a peer acquires their first piece of data in the dataset, they are now a partial mirror for the dataset. If someone else contacts them and needs the piece they have, they can share it. This can happen simultaneously while the peer is still downloading the pieces they want. + +### 3.1.1 Source Discovery + +An important aspect of mirring is source discovery, the techniques that peers use to find each other. Source discovery means finding the IP and port of data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data using the Dat file exchange protocol, Hypercore. By using source discovery techniques we are able to create a network where data can be discovered even if the original data source disappears. + +Source discovery can happen over many kinds of networks, as long as you can model the following actions: + +- `join(key, [port])` - Begin performing regular lookups on an interval for `key`. Specify `port` if you want to announce that you share `key` as well. +- `leave(key, [port])` - Stop looking for `key`. Specify `port` to stop announcing that you share `key` as well. +- `foundpeer(key, ip, port)` - Called when a peer is found by a lookup + +In the Dat implementation we implement the above actions on top of three types of discovery networks: + +- DNS name servers - An Internet standard mechanism for resolving keys to addresses +- Multicast DNS - Useful for discovering peers on local networks +- Kademlia Mainline Distributed Hash Table - Zero point of failure, increases probability of Dat working even if DNS servers are unreachable + +Additional discovery networks can be implemented as needed. We chose the above three as a starting point to have a complementary mix of strategies to increase the probability of source discovery. + +Our implementation of peer discovery is called discovery-channel. We also run a [custom DNS server](https://www.npmjs.com/package/dns-discovery) that Dat clients use (in addition to specifying their own if they need to), as well as a [DHT bootstrap](https://github.com/bittorrent/bootstrap-dht) server. These discovery servers are the only centralized infrastructure we need for Dat to work over the Internet, but they are redundant, interchangeable, never see the actual data being shared, anyone can run their own and Dat will still work even if they all are unavailable. If this happens discovery will just be manual (e.g. manually sharing IP/ports). Every data source that has a copy of the data also advertises themselves across these discovery networks. + +### 3.1.2 Peer Connections + +Up until this point we have just done searches to find who has the data we need. Now that we know who should talk to, we have to connect to them. Once we have a duplex binary connection to a peer we then layer on our own file sharing protocol on top, called [Hypercore](https://github.com/mafintosh/hypercore). + +In our implementation, we use either [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol), [UTP](https://en.wikipedia.org/wiki/Micro_Transport_Protocol) or WebRTC sockets for the actual peer to peer connections. UTP is nice because it is designed to *not* take up all available bandwidth on a network (e.g. so that other people sharing your wifi can still use the Internet). WebRTC support makes Dat work in modern web browsers using peer to peer connections. + +When we get the IP and port for a potential source we try to connect using all available protocols and hope one works. If one connects first, we abort the other ones. If none connect, we try again until we decide that source is offline or unavailable to use and we stop trying to connect to them. Sources we are able to connect to go into a list of known good sources, so that if our Internet connection goes down we can use that list to reconnect to our good sources again quickly. + +If we get a lot of potential sources we pick a handful at random to try and connect to and keep the rest around as additional sources to use later in case we decide we need more sources. A lot of these are parameters that we can tune for different scenarios later, but have started with some best guesses as defaults. + +The connection logic is implemented in a module called [discovery-swarm](https://www.npmjs.com/package/discovery-swarm). This builds on discovery-channel and adds connection establishment, management and statistics. You can see stats like how many sources are currently connected, how many good and bad behaving sources you've talked to, and it automatically handles connecting and reconnecting to sources for you. Our UTP support is implemented in the module [utp-native](https://www.npmjs.com/package/utp-native). + +So now we have found data sources, have connected to them, but we havent yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. This is explained in a later section. + +Peer connections types are outside the scope of the Dat protocol, but in the Dat implementation we make a best effort to make as successful connections using our default types as possible. This means employing peer to peer connection techniques like UDP hole punching. Our approach to hole punching is to use a central known server, in our case it is our DNS server, which is accessible on the public Internet. + +In a scenario where two peers A and B want to connect, and both know the central server: + +1. Peer A creates a local UDP socket and messages the central server that it is interested in connecting to people. +2. Central server messages Peer A back with a token that is a `hash(Peer A's remote IP + a local secret)`. The UDP packet contains the remote IP. +3. Peer A messages the central server with the token (this way you cannot spoof your IP and DDOS a remote peer) +4. Peer B does the same. +5. When the central server receives Peer B's message that it wants to connect to peers it forwards Peer B's message to Peer A and Peer A's message to Peer B. +6. Both peers now send a message to each other on their public IP and port. If UDP hole punching is supported by the routers of both peers, one of the messages should get through. +7. At this point we reuse the UDP socket to run UTP on top to get a streaming reliable interface. + +## 3.2 Content Integrity + +Content integrity means being able to verify the data you received is the exact same version of the data that you expected. This is important for reproducibility, as -- cgit v1.2.3