aboutsummaryrefslogtreecommitdiffstats
path: root/papers/dat-paper.txt
diff options
context:
space:
mode:
Diffstat (limited to 'papers/dat-paper.txt')
-rw-r--r--papers/dat-paper.txt1282
1 files changed, 0 insertions, 1282 deletions
diff --git a/papers/dat-paper.txt b/papers/dat-paper.txt
deleted file mode 100644
index f0e71c2..0000000
--- a/papers/dat-paper.txt
+++ /dev/null
@@ -1,1282 +0,0 @@
-\documentclass[a4paperpaper,twocolumn]{article}
-\usepackage{lmodern}
-\usepackage{amssymb,amsmath}
-\usepackage{ifxetex,ifluatex}
-\usepackage{fixltx2e} % provides \textsubscript
-\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
- \usepackage[T1]{fontenc}
- \usepackage[utf8]{inputenc}
-\else % if luatex or xelatex
- \ifxetex
- \usepackage{mathspec}
- \else
- \usepackage{fontspec}
- \fi
- \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
-\fi
-% use upquote if available, for straight quotes in verbatim environments
-\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
-% use microtype if available
-\IfFileExists{microtype.sty}{%
-\usepackage{microtype}
-\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
-}{}
-\usepackage[unicode=true]{hyperref}
-\hypersetup{
- pdftitle={Dat - Distributed Dataset Synchronization And Versioning},
- pdfauthor={Maxwell Ogden, Karissa McKelvey, Mathias Buus Madsen, Code for Science},
- pdfborder={0 0 0},
- breaklinks=true}
-\urlstyle{same} % don't use monospace font for urls
-\IfFileExists{parskip.sty}{%
-\usepackage{parskip}
-}{% else
-\setlength{\parindent}{0pt}
-\setlength{\parskip}{6pt plus 2pt minus 1pt}
-}
-\setlength{\emergencystretch}{3em} % prevent overfull lines
-\providecommand{\tightlist}{%
- \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
-\setcounter{secnumdepth}{0}
-% Redefines (sub)paragraphs to behave more like sections
-\ifx\paragraph\undefined\else
-\let\oldparagraph\paragraph
-\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
-\fi
-\ifx\subparagraph\undefined\else
-\let\oldsubparagraph\subparagraph
-\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
-\fi
-
-% set default figure placement to htbp
-\makeatletter
-\def\fps@figure{htbp}
-\makeatother
-
-
-\title{Dat - Distributed Dataset Synchronization And Versioning}
-\author{Maxwell Ogden, Karissa McKelvey, Mathias Buus Madsen, Code for Science}
-\date{May 2017}
-
-\begin{document}
-\maketitle
-
-\section{Abstract}\label{abstract}
-
-Dat is a protocol designed for syncing folders of data, even if they are
-large or changing constantly. Dat uses a cryptographically secure
-register of changes to prove that the requested data version is
-distributed. A byte range of any file's version can be efficiently
-streamed from a Dat repository over a network connection. Consumers can
-choose to fully or partially replicate the contents of a remote Dat
-repository, and can also subscribe to live changes. To ensure writer and
-reader privacy, Dat uses public key cryptography to encrypt network
-traffic. A group of Dat clients can connect to each other to form a
-public or private decentralized network to exchange data between each
-other. A reference implementation is provided in JavaScript.
-
-\section{1. Background}\label{background}
-
-Many datasets are shared online today using HTTP and FTP, which lack
-built in support for version control or content addressing of data. This
-results in link rot and content drift as files are moved, updated or
-deleted, leading to an alarming rate of disappearing data references in
-areas such as
-\href{http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253}{published
-scientific literature}.
-
-Cloud storage services like S3 ensure availability of data, but they
-have a centralized hub-and-spoke networking model and are therefore
-limited by their bandwidth, meaning popular files can become very
-expensive to share. Services like Dropbox and Google Drive provide
-version control and synchronization on top of cloud storage services
-which fixes many issues with broken links but rely on proprietary code
-and services requiring users to store their data on centralized cloud
-infrastructure which has implications on cost, transfer speeds, vendor
-lock-in and user privacy.
-
-Distributed file sharing tools can become faster as files become more
-popular, removing the bandwidth bottleneck and making file distribution
-cheaper. They also use link resolution and discovery systems which can
-prevent broken links meaning if the original source goes offline other
-backup sources can be automatically discovered. However these file
-sharing tools today are not supported by Web browsers, do not have good
-privacy guarantees, and do not provide a mechanism for updating files
-without redistributing a new dataset which could mean entirely
-redownloading data you already have.
-
-\section{2. Dat}\label{dat}
-
-Dat is a dataset synchronization protocol that does not assume a dataset
-is static or that the entire dataset will be downloaded. The main
-reference implementation is available from npm as
-\texttt{npm\ install\ dat\ -g}.
-
-The protocol is agnostic to the underlying transport e.g.~you could
-implement Dat over carrier pigeon. Data is stored in a format called
-SLEEP (Ogden and Buus 2017), described in it's own paper. The key
-properties of the Dat design are explained in this section.
-
-\begin{itemize}
-\tightlist
-\item
- 2.1 \textbf{Content Integrity} - Data and publisher integrity is
- verified through use of signed hashes of the content.
-\item
- 2.2 \textbf{Decentralized Mirroring} - Users sharing the same Dat
- automatically discover each other and exchange data in a swarm.
-\item
- 2.3 \textbf{Network Privacy} - Dat provides certain privacy guarantees
- including end-to-end encryption.
-\item
- 2.4 \textbf{Incremental Versioning} - Datasets can be efficiently
- synced, even in real time, to other peers.
-\item
- 2.5 \textbf{Random Access} - Huge file hierarchies can be efficiently
- traversed remotely.
-\end{itemize}
-
-\subsection{2.1 Content Integrity}\label{content-integrity}
-
-Content integrity means being able to verify the data you received is
-the exact same version of the data that you expected. This is important
-in a distributed system as this mechanism will catch incorrect data sent
-by bad peers. It also has implications for reproducibility as it lets
-you refer to a specific version of a dataset.
-
-Link rot, when links online stop resolving, and content drift, when data
-changes but the link to the data remains the same, are two common issues
-in data analysis. For example, one day a file called data.zip might
-change, but a typical HTTP link to the file does not include a hash of
-the content, or provide a way to get updated metadata, so clients that
-only have the HTTP link have no way to check if the file changed without
-downloading the entire file again. Referring to a file by the hash of
-its content is called content addressability, and lets users not only
-verify that the data they receive is the version of the data they want,
-but also lets people cite specific versions of the data by referring to
-a specific hash.
-
-Dat uses BLAKE2b (Aumasson et al. 2013) cryptographically secure hashes
-to address content. Hashes are arranged in a Merkle tree (Mykletun,
-Narasimha, and Tsudik 2003), a tree where each non-leaf node is the hash
-of all child nodes. Leaf nodes contain pieces of the dataset. Due to the
-properties of secure cryptographic hashes the top hash can only be
-produced if all data below it matches exactly. If two trees have
-matching top hashes then you know that all other nodes in the tree must
-match as well, and you can conclude that your dataset is synchronized.
-Trees are chosen as the primary data structure in Dat as they have a
-number of properties that allow for efficient access to subsets of the
-metadata, which allows Dat to work efficiently over a network
-connection.
-
-\subsubsection{Dat Links}\label{dat-links}
-
-Dat links are Ed25519 (Bernstein et al. 2012) public keys which have a
-length of 32 bytes (64 characters when Hex encoded). You can represent
-your Dat link in the following ways and Dat clients will be able to
-understand them:
-
-\begin{itemize}
-\tightlist
-\item
- The standalone public key:
-\end{itemize}
-
-\texttt{8e1c7189b1b2dbb5c4ec2693787884771201da9...}
-
-\begin{itemize}
-\tightlist
-\item
- Using the dat:// protocol:
-\end{itemize}
-
-\texttt{dat://8e1c7189b1b2dbb5c4ec2693787884771...}
-
-\begin{itemize}
-\tightlist
-\item
- As part of an HTTP URL:
-\end{itemize}
-
-\texttt{https://datproject.org/8e1c7189b1b2dbb5...}
-
-All messages in the Dat protocol are encrypted and signed using the
-public key during transport. This means that unless you know the public
-key (e.g.~unless the Dat link was shared with you) then you will not be
-able to discover or communicate with any member of the swarm for that
-Dat. Anyone with the public key can verify that messages (such as
-entries in a Dat Stream) were created by a holder of the private key.
-
-Every Dat repository has a corresponding private key which is kept in
-your home folder and never shared. Dat never exposes either the public
-or private key over the network. During the discovery phase the BLAKE2b
-hash of the public key is used as the discovery key. This means that the
-original key is impossible to discover (unless it was shared publicly
-through a separate channel) since only the hash of the key is exposed
-publicly.
-
-Dat does not provide an authentication mechanism at this time. Instead
-it provides a capability system. Anyone with the Dat link is currently
-considered able to discover and access data. Do not share your Dat links
-publicly if you do not want them to be accessed.
-
-\subsubsection{Hypercore and Hyperdrive}\label{hypercore-and-hyperdrive}
-
-The Dat storage, content integrity, and networking protocols are
-implemented in a module called
-\href{https://npmjs.org/hypercore}{Hypercore}. Hypercore is agnostic to
-the format of the input data, it operates on any stream of binary data.
-For the Dat use case of synchronizing datasets we use a file system
-module on top of Hypercore called
-\href{https://npmjs.org/hyperdrive}{Hyperdrive}.
-
-Dat has a layered abstraction so that users can use Hypercore directly
-to have full control over how they model their data. Hyperdrive works
-well when your data can be represented as files on a filesystem, which
-is the main use case with Dat.
-
-\subsubsection{Hypercore Registers}\label{hypercore-registers}
-
-Hypercore Registers are the core mechanism used in Dat. They are binary
-append-only streams whose contents are cryptographically hashed and
-signed and therefore can be verified by anyone with access to the public
-key of the writer. They are an implementation of the concept known as a
-register, a digital ledger you can trust.
-
-Dat uses two registers, \texttt{content} and \texttt{metadata}. The
-\texttt{content} register contains the files in your repository and
-\texttt{metadata} contains the metadata about the files including name,
-size, last modified time, etc. Dat replicates them both when
-synchronizing with another peer.
-
-When files are added to Dat, each file gets split up into some number of
-chunks, and the chunks are then arranged into a Merkle tree, which is
-used later for version control and replication processes.
-
-\subsection{2.2 Decentralized Mirroring}\label{decentralized-mirroring}
-
-Dat is a peer to peer protocol designed to exchange pieces of a dataset
-amongst a swarm of peers. As soon as a peer acquires their first piece
-of data in the dataset they can choose to become a partial mirror for
-the dataset. If someone else contacts them and needs the piece they
-have, they can choose to share it. This can happen simultaneously while
-the peer is still downloading the pieces they want from others.
-
-\subsubsection{Source Discovery}\label{source-discovery}
-
-An important aspect of mirroring is source discovery, the techniques
-that peers use to find each other. Source discovery means finding the IP
-and port of data sources online that have a copy of that data you are
-looking for. You can then connect to them and begin exchanging data. By
-using source discovery techniques Dat is able to create a network where
-data can be discovered even if the original data source disappears.
-
-Source discovery can happen over many kinds of networks, as long as you
-can model the following actions:
-
-\begin{itemize}
-\tightlist
-\item
- \texttt{join(key,\ {[}port{]})} - Begin performing regular lookups on
- an interval for \texttt{key}. Specify \texttt{port} if you want to
- announce that you share \texttt{key} as well.
-\item
- \texttt{leave(key,\ {[}port{]})} - Stop looking for \texttt{key}.
- Specify \texttt{port} to stop announcing that you share \texttt{key}
- as well.
-\item
- \texttt{foundpeer(key,\ ip,\ port)} - Called when a peer is found by a
- lookup.
-\end{itemize}
-
-In the Dat implementation we implement the above actions on top of three
-types of discovery networks:
-
-\begin{itemize}
-\tightlist
-\item
- DNS name servers - An Internet standard mechanism for resolving keys
- to addresses
-\item
- Multicast DNS - Useful for discovering peers on local networks
-\item
- Kademlia Mainline Distributed Hash Table - Less central points of
- failure, increases probability of Dat working even if DNS servers are
- unreachable
-\end{itemize}
-
-Additional discovery networks can be implemented as needed. We chose the
-above three as a starting point to have a complementary mix of
-strategies to increase the probability of source discovery. Additionally
-you can specify a Dat via HTTPS link, which runs the Dat protocol in
-``single-source'' mode, meaning the above discovery networks are not
-used, and instead only that one HTTPS server is used as the only peer.
-
-\subsubsection{Peer Connections}\label{peer-connections}
-
-After the discovery phase, Dat should have a list of potential data
-sources to try and contact. Dat uses either TCP, HTTP or
-\href{https://en.wikipedia.org/wiki/Micro_Transport_Protocol}{UTP}
-(Rossi et al. 2010). UTP uses LEDBAT which is designed to not take up
-all available bandwidth on a network (e.g.~so that other people sharing
-wifi can still use the Internet), and is still based on UDP so works
-with NAT traversal techniques like UDP hole punching. HTTP is supported
-for compatibility with static file servers and web browser clients. Note
-that these are the protocols we support in the reference Dat
-implementation, but the Dat protocol itself is transport agnostic.
-
-If an HTTP source is specified Dat will prefer that one over other
-sources. Otherwise when Dat gets the IP and port for a potential TCP or
-UTP source it tries to connect using both protocols. If one connects
-first, Dat aborts the other one. If none connect, Dat will try again
-until it decides that source is offline or unavailable and then stops
-trying to connect to them. Sources Dat is able to connect to go into a
-list of known good sources, so that if/when the Internet connection goes
-down Dat can use that list to reconnect to known good sources again
-quickly.
-
-If Dat gets a lot of potential sources it picks a handful at random to
-try and connect to and keeps the rest around as additional sources to
-use later in case it decides it needs more sources.
-
-Once a duplex binary connection to a remote source is open Dat then
-layers on the Hypercore protocol, a message-based replication protocol
-that allows two peers to communicate over a stateless channel to request
-and exchange data. You open separate replication channels with many
-peers at once which allows clients to parallelize data requests across
-the entire pool of peers they have established connections with.
-
-\subsection{2.3 Network Privacy}\label{network-privacy}
-
-On the Web today, with SSL, there is a guarantee that the traffic
-between your computer and the server is private. As long as you trust
-the server to not leak your logs, attackers who intercept your network
-traffic will not be able to read the HTTP traffic exchanged between you
-and the server. This is a fairly straightforward model as clients only
-have to trust a single server for some domain.
-
-There is an inherent tradeoff in peer to peer systems of source
-discovery vs.~user privacy. The more sources you contact and ask for
-some data, the more sources you trust to keep what you asked for
-private. Our goal is to have Dat be configurable in respect to this
-tradeoff to allow application developers to meet their own privacy
-guidelines.
-
-It is up to client programs to make design decisions around which
-discovery networks they trust. For example if a Dat client decides to
-use the BitTorrent DHT to discover peers, and they are searching for a
-publicly shared Dat key (e.g.~a key cited publicly in a published
-scientific paper) with known contents, then because of the privacy
-design of the BitTorrent DHT it becomes public knowledge what key that
-client is searching for.
-
-A client could choose to only use discovery networks with certain
-privacy guarantees. For example a client could only connect to an
-approved list of sources that they trust, similar to SSL. As long as
-they trust each source, the encryption built into the Dat network
-protocol will prevent the Dat key they are looking for from being
-leaked.
-
-\subsection{2.4 Incremental Versioning}\label{incremental-versioning}
-
-Given a stream of binary data, Dat splits the stream into chunks, hashes
-each chunk, and arranges the hashes in a specific type of Merkle tree
-that allows for certain replication properties.
-
-Dat is also able to fully or partially synchronize streams in a
-distributed setting even if the stream is being appended to. This is
-accomplished by using the messaging protocol to traverse the Merkle tree
-of remote sources and fetch a strategic set of nodes. Due to the
-low-level, message-oriented design of the replication protocol,
-different node traversal strategies can be implemented.
-
-There are two types of versioning performed automatically by Dat.
-Metadata is stored in a folder called \texttt{.dat} in the root folder
-of a repository, and data is stored as normal files in the root folder.
-
-\subsubsection{Metadata Versioning}\label{metadata-versioning}
-
-Dat tries as much as possible to act as a one-to-one mirror of the state
-of a folder and all its contents. When importing files, Dat uses a
-sorted, depth-first recursion to list all the files in the tree. For
-each file it finds, it grabs the filesystem metadata (filename, Stat
-object, etc) and checks if there is already an entry for this filename
-with this exact metadata already represented in the Dat repository
-metadata. If the file with this metadata matches exactly the newest
-version of the file metadata stored in Dat, then this file will be
-skipped (no change).
-
-If the metadata differs from the current existing one (or there are no
-entries for this filename at all in the history), then this new metadata
-entry will be appended as the new `latest' version for this file in the
-append-only SLEEP metadata content register (described below).
-
-\subsubsection{Content Versioning}\label{content-versioning}
-
-In addition to storing a historical record of filesystem metadata, the
-content of the files themselves are also capable of being stored in a
-version controlled manner. The default storage system used in Dat stores
-the files as files. This has the advantage of being very straightforward
-for users to understand, but the downside of not storing old versions of
-content by default.
-
-In contrast to other version control systems like Git, Dat by default
-only stores the current set of checked out files on disk in the
-repository folder, not old versions. It does store all previous metadata
-for old versions in \texttt{.dat}. Git for example stores all previous
-content versions and all previous metadata versions in the \texttt{.git}
-folder. Because Dat is designed for larger datasets, if it stored all
-previous file versions in \texttt{.dat}, then the \texttt{.dat} folder
-could easily fill up the users hard drive inadvertently. Therefore Dat
-has multiple storage modes based on usage.
-
-Hypercore registers include an optional \texttt{data} file that stores
-all chunks of data. In Dat, only the \texttt{metadata.data} file is
-used, but the \texttt{content.data} file is not used. The default
-behavior is to store the current files only as normal files. If you want
-to run an `archival' node that keeps all previous versions, you can
-configure Dat to use the \texttt{content.data} file instead. For
-example, on a shared server with lots of storage you probably want to
-store all versions. However on a workstation machine that is only
-accessing a subset of one version, the default mode of storing all
-metadata plus the current set of downloaded files is acceptable, because
-you know the server has the full history.
-
-\subsubsection{Merkle Trees}\label{merkle-trees}
-
-Registers in Dat use a specific method of encoding a Merkle tree where
-hashes are positioned by a scheme called binary in-order interval
-numbering or just ``bin'' numbering. This is just a specific,
-deterministic way of laying out the nodes in a tree. For example a tree
-with 7 nodes will always be arranged like this:
-
-\begin{verbatim}
-0
- 1
-2
- 3
-4
- 5
-6
-\end{verbatim}
-
-In Dat, the hashes of the chunks of files are always even numbers, at
-the wide end of the tree. So the above tree had four original values
-that become the even numbers:
-
-\begin{verbatim}
-chunk0 -> 0
-chunk1 -> 2
-chunk2 -> 4
-chunk3 -> 6
-\end{verbatim}
-
-In the resulting Merkle tree, the even and odd nodes store different
-information:
-
-\begin{itemize}
-\tightlist
-\item
- Evens - List of data hashes {[}chunk0, chunk1, chunk2, \ldots{}{]}
-\item
- Odds - List of Merkle hashes (hashes of child even nodes) {[}hash0,
- hash1, hash2, \ldots{}{]}
-\end{itemize}
-
-These two lists get interleaved into a single register such that the
-indexes (position) in the register are the same as the bin numbers from
-the Merkle tree.
-
-All odd hashes are derived by hashing the two child nodes, e.g.~given
-hash0 is \texttt{hash(chunk0)} and hash2 is \texttt{hash(chunk1)}, hash1
-is \texttt{hash(hash0\ +\ hash2)}.
-
-For example a register with two data entries would look something like
-this (pseudocode):
-
-\begin{verbatim}
-0. hash(value0)
-1. hash(hash(chunk0) + hash(chunk1))
-2. hash(value1)
-\end{verbatim}
-
-It is possible for the in-order Merkle tree to have multiple roots at
-once. A root is defined as a parent node with a full set of child node
-slots filled below it.
-
-For example, this tree hash 2 roots (1 and 4)
-
-\begin{verbatim}
-0
- 1
-2
-
-4
-\end{verbatim}
-
-This tree hash one root (3):
-
-\begin{verbatim}
-0
- 1
-2
- 3
-4
- 5
-6
-\end{verbatim}
-
-This one has one root (1):
-
-\begin{verbatim}
-0
- 1
-2
-\end{verbatim}
-
-\subsubsection{Replication Example}\label{replication-example}
-
-This section describes in high level the replication flow of a Dat. Note
-that the low level details are available by reading the SLEEP section
-below. For the sake of illustrating how this works in practice in a
-networked replication scenario, consider a folder with two files:
-
-\begin{verbatim}
-bat.jpg
-cat.jpg
-\end{verbatim}
-
-To send these files to another machine using Dat, you would first add
-them to a Dat repository by splitting them into chunks and constructing
-SLEEP files representing the chunks and filesystem metadata.
-
-Let's assume \texttt{bat.jpg} and \texttt{cat.jpg} both produce three
-chunks, each around 64KB. Dat stores in a representation called SLEEP,
-but here we will show a pseudo-representation for the purposes of
-illustrating the replication process. The six chunks get sorted into a
-list like this:
-
-\begin{verbatim}
-bat-1
-bat-2
-bat-3
-cat-1
-cat-2
-cat-3
-\end{verbatim}
-
-These chunks then each get hashed, and the hashes get arranged into a
-Merkle tree (the content register):
-
-\begin{verbatim}
-0 - hash(bat-1)
- 1 - hash(0 + 2)
-2 - hash(bat-2)
- 3 - hash(1 + 5)
-4 - hash(bat-3)
- 5 - hash(4 + 6)
-6 - hash(cat-1)
-8 - hash(cat-2)
- 9 - hash(8 + 10)
-10 - hash(cat-3)
-\end{verbatim}
-
-Next we calculate the root hashes of our tree, in this case 3 and 9. We
-then hash them together, and cryptographically sign the hash. This
-signed hash now can be used to verify all nodes in the tree, and the
-signature proves it was produced by us, the holder of the private key
-for this Dat.
-
-This tree is for the hashes of the contents of the photos. There is also
-a second Merkle tree that Dat generates that represents the list of
-files and their metadata and looks something like this (the metadata
-register):
-
-\begin{verbatim}
-0 - hash({contentRegister: '9e29d624...'})
- 1 - hash(0 + 2)
-2 - hash({"bat.jpg", first: 0, length: 3})
-4 - hash({"cat.jpg", first: 3, length: 3})
-\end{verbatim}
-
-The first entry in this feed is a special metadata entry that tells Dat
-the address of the second feed (the content register). Note that node 3
-is not included yet, because 3 is the hash of \texttt{1\ +\ 5}, but 5
-does not exist yet, so will be written at a later update.
-
-Now we're ready to send our metadata to the other peer. The first
-message is a \texttt{Register} message with the key that was shared for
-this Dat. Let's call ourselves Alice and the other peer Bob. Alice sends
-Bob a \texttt{Want} message that declares they want all nodes in the
-file list (the metadata register). Bob replies with a single
-\texttt{Have} message that indicates he has 2 nodes of data. Alice sends
-three \texttt{Request} messages, one for each leaf node
-(\texttt{0,\ 2,\ 4}). Bob sends back three \texttt{Data} messages. The
-first \texttt{Data} message contains the content register key, the hash
-of the sibling, in this case node \texttt{2}, the hash of the uncle root
-\texttt{4}, as well as a signature for the root hashes (in this case
-\texttt{1,\ 4}). Alice verifies the integrity of this first
-\texttt{Data} message by hashing the metadata received for the content
-register metadata to produce the hash for node \texttt{0}. They then
-hash the hash \texttt{0} with the hash \texttt{2} that was included to
-reproduce hash \texttt{1}, and hashes their \texttt{1} with the value
-for \texttt{4} they received, which they can use the received signature
-to verify it was the same data. When the next \texttt{Data} message is
-received, a similar process is performed to verify the content.
-
-Now Alice has the full list of files in the Dat, but decides they only
-want to download \texttt{cat.png}. Alice knows they want blocks 3
-through 6 from the content register. First Alice sends another
-\texttt{Register} message with the content key to open a new replication
-channel over the connection. Then Alice sends three \texttt{Request}
-messages, one for each of blocks \texttt{4,\ 5,\ 6}. Bob sends back
-three \texttt{Data} messages with the data for each block, as well as
-the hashes needed to verify the content in a way similar to the process
-described above for the metadata feed.
-
-\subsection{2.5 Random Access}\label{random-access}
-
-Dat pursues the following access capabilities:
-
-\begin{itemize}
-\tightlist
-\item
- Support large file hierachies (millions of files in a single
- repository).
-\item
- Support efficient traversal of the hierarchy (listing files in
- arbitrary folders efficiently).
-\item
- Store all changes to all files (metadata and/or content).
-\item
- List all changes made to any single file.
-\item
- View the state of all files relative to any point in time.
-\item
- Subscribe live to all changes (any file).
-\item
- Subscribe live to changes to files under a specific path.
-\item
- Efficiently access any byte range of any version of any file.
-\item
- Allow all of the above to happen remotely, only syncing the minimum
- metadata necessary to perform any action.
-\item
- Allow efficient comparison of remote and local repository state to
- request missing pieces during synchronization.
-\item
- Allow entire remote archive to be synchronized, or just some subset of
- files and/or versions.
-\end{itemize}
-
-The way Dat accomplishes these is through a combination of storing all
-changes in Hypercore feeds, but also using strategic metadata indexing
-strategies that support certain queries efficiently to be performed by
-traversing the Hypercore feeds. The protocol itself is specified in
-Section 3 (SLEEP), but a scenario based summary follows here.
-
-\subsubsection{Scenario: Reading a file from a specific byte
-offset}\label{scenario-reading-a-file-from-a-specific-byte-offset}
-
-Alice has a dataset in Dat, Bob wants to access a 100MB CSV called
-\texttt{cat\_dna.csv} stored in the remote repository, but only wants to
-access the 10MB range of the CSV spanning from 30MB - 40MB.
-
-Bob has never communicated with Alice before, and is starting fresh with
-no knowledge of this Dat repository other than that he knows he wants
-\texttt{cat\_dna.csv} at a specific offset.
-
-First, Bob asks Alice through the Dat protocol for the metadata he needs
-to resolve \texttt{cat\_dna.csv} to the correct metadata feed entry that
-represents the file he wants. Note: In this scenario we assume Bob wants
-the latest version of \texttt{cat\_dna.csv}. It is also possible to do
-this for a specific older version.
-
-Bob first sends a \texttt{Request} message for the latest entry in the
-metadata feed. Alice responds. Bob looks at the \texttt{trie} value, and
-using the lookup algorithm described below sends another
-\texttt{Request} message for the metadata node that is closer to the
-filename he is looking for. This repeats until Alice sends Bob the
-matching metadata entry. This is the un-optimized resolution that uses
-\texttt{log(n)} round trips, though there are ways to optimize this by
-having Alice send additional sequence numbers to Bob that help him
-traverse in less round trips.
-
-In the metadata record Bob recieved for \texttt{cat\_dna.csv} there is
-the byte offset to the beginning of the file in the data feed. Bob adds
-his +30MB offset to this value and starts requesting pieces of data
-starting at that byte offset using the SLEEP protocol as described
-below.
-
-This method tries to allow any byte range of any file to be accessed
-without the need to synchronize the full metadata for all files up
-front.
-
-\subsubsection{Scenario: Syncing live changes to files at a specific
-path}\label{scenario-syncing-live-changes-to-files-at-a-specific-path}
-
-TODO
-
-\subsubsection{Scenario: Syncing an entire
-archive}\label{scenario-syncing-an-entire-archive}
-
-TODO
-
-\subsection{3. Dat Network Protocol}\label{dat-network-protocol}
-
-The SLEEP format is designed to allow for sparse replication, meaning
-you can efficiently download only the metadata and data required to
-resolve a single byte region of a single file, which makes Dat suitable
-for a wide variety of streaming, real time and large dataset use cases.
-
-To take advantage of this, Dat includes a network protocol. It is
-message-based and stateless, making it possible to implement on a
-variety of network transport protocols including UDP and TCP. Both
-metadata and content registers in SLEEP share the exact same replication
-protocol.
-
-Individual messages are encoded using Protocol Buffers and there are ten
-message types using the following schema:
-
-\subsubsection{Wire Protocol}\label{wire-protocol}
-
-Over the wire messages are packed in the following lightweight container
-format
-
-\begin{verbatim}
-<varint - length of rest of message>
- <varint - header>
- <message>
-\end{verbatim}
-
-The \texttt{header} value is a single varint that has two pieces of
-information: the integer \texttt{type} that declares a 4-bit message
-type (used below), and a channel identifier, \texttt{0} for metadata and
-\texttt{1} for content.
-
-To generate this varint, you bitshift the 4-bit type integer onto the
-end of the channel identifier, e.g.
-\texttt{channel\ \textless{}\textless{}\ 4\ \textbar{}\ \textless{}4-bit-type\textgreater{}}.
-
-\subsubsection{Feed}\label{feed}
-
-Type 0. Should be the first message sent on a channel.
-
-\begin{itemize}
-\tightlist
-\item
- \texttt{discoveryKey} - A BLAKE2b keyed hash of the string `hypercore'
- using the public key of the metadata register as the key.
-\item
- \texttt{nonce} - 32 bytes of random binary data, used in our
- encryption scheme
-\end{itemize}
-
-\begin{verbatim}
-message Feed {
- required bytes discoveryKey = 1;
- optional bytes nonce = 2;
-}
-\end{verbatim}
-
-\subsubsection{Handshake}\label{handshake}
-
-Type 1. Overall connection handshake. Should be sent just after the feed
-message on the first channel only (metadata).
-
-\begin{itemize}
-\tightlist
-\item
- \texttt{id} - 32 byte random data used as a identifier for this peer
- on the network, useful for checking if you are connected to yourself
- or another peer more than once
-\item
- \texttt{live} - Whether or not you want to operate in live
- (continuous) replication mode or end after the initial sync
-\item
- \texttt{userData} - User-specific metadata encoded as a byte sequence
-\item
- \texttt{extensions} - List of extensions that are supported on this
- Feed
-\end{itemize}
-
-\begin{verbatim}
-message Handshake {
- optional bytes id = 1;
- optional bool live = 2;
- optional bytes userData = 3;
- repeated string extensions = 4;
-}
-\end{verbatim}
-
-\subsubsection{Info}\label{info}
-
-Type 2. Message indicating state changes. Used to indicate whether you
-are uploading and/or downloading.
-
-Initial state for uploading/downloading is true. If both ends are not
-downloading and not live it is safe to consider the stream ended.
-
-\begin{verbatim}
-message Info {
- optional bool uploading = 1;
- optional bool downloading = 2;
-}
-\end{verbatim}
-
-\subsubsection{Have}\label{have}
-
-Type 3. How you tell the other peer what chunks of data you have or
-don't have. You should only send Have messages to peers who have
-expressed interest in this region with Want messages.
-
-\begin{itemize}
-\tightlist
-\item
- \texttt{start} - If you only specify \texttt{start}, it means you are
- telling the other side you only have 1 chunk at the position at the
- value in \texttt{start}.
-\item
- \texttt{length} - If you specify length, you can describe a range of
- values that you have all of, starting from \texttt{start}.
-\item
- \texttt{bitfield} - If you would like to send a range of sparse data
- about haves/don't haves via bitfield, relative to \texttt{start}.
-\end{itemize}
-
-\begin{verbatim}
-message Have {
- required uint64 start = 1;
- optional uint64 length = 2 [default = 1];
- optional bytes bitfield = 3;
-}
-\end{verbatim}
-
-When sending bitfields you must run length encode them. The encoded
-bitfield is a series of compressed and uncompressed bit sequences. All
-sequences start with a header that is a varint.
-
-If the last bit is set in the varint (it is an odd number) then a header
-represents a compressed bit sequence.
-
-\begin{verbatim}
-compressed-sequence = varint(
- byte-length-of-sequence
- << 2 | bit << 1 | 1
-)
-\end{verbatim}
-
-If the last bit is \emph{not} set then a header represents a
-non-compressed sequence.
-
-\begin{verbatim}
-uncompressed-sequence = varint(
- byte-length-of-bitfield << 1 | 0
-) + (bitfield)
-\end{verbatim}
-
-\subsubsection{Unhave}\label{unhave}
-
-Type 4. How you communicate that you deleted or removed a chunk you used
-to have.
-
-\begin{verbatim}
-message Unhave {
- required uint64 start = 1;
- optional uint64 length = 2 [default = 1];
-}
-\end{verbatim}
-
-\subsubsection{Want}\label{want}
-
-Type 5. How you ask the other peer to subscribe you to Have messages for
-a region of chunks. The \texttt{length} value defaults to Infinity or
-feed.length (if not live).
-
-\begin{verbatim}
-message Want {
- required uint64 start = 1;
- optional uint64 length = 2;
-}
-\end{verbatim}
-
-\subsubsection{Unwant}\label{unwant}
-
-Type 6. How you ask to unsubscribe from Have messages for a region of
-chunks from the other peer. You should only Unwant previously Wanted
-regions, but if you do Unwant something that hasn't been Wanted it won't
-have any effect. The \texttt{length} value defaults to Infinity or
-feed.length (if not live).
-
-\begin{verbatim}
-message Unwant {
- required uint64 start = 1;
- optional uint64 length = 2;
-}
-\end{verbatim}
-
-\subsubsection{Request}\label{request}
-
-Type 7. Request a single chunk of data.
-
-\begin{itemize}
-\tightlist
-\item
- \texttt{index} - The chunk index for the chunk you want. You should
- only ask for indexes that you have received the Have messages for.
-\item
- \texttt{bytes} - You can also optimistically specify a byte offset,
- and in the case the remote is able to resolve the chunk for this byte
- offset depending on their Merkle tree state, they will ignore the
- \texttt{index} and send the chunk that resolves for this byte offset
- instead. But if they cannot resolve the byte request, \texttt{index}
- will be used.
-\item
- \texttt{hash} - If you only want the hash of the chunk and not the
- chunk data itself.
-\item
- \texttt{nodes} - A 64 bit long bitfield representing which parent
- nodes you have.
-\end{itemize}
-
-The \texttt{nodes} bitfield is an optional optimization to reduce the
-amount of duplicate nodes exchanged during the replication lifecycle. It
-indicates which parents you have or don't have. You have a maximum of 64
-parents you can specify. Because \texttt{uint64} in Protocol Buffers is
-implemented as a varint, over the wire this does not take up 64 bits in
-most cases. The first bit is reserved to signify whether or not you need
-a signature in response. The rest of the bits represent whether or not
-you have (\texttt{1}) or don't have (\texttt{0}) the information at this
-node already. The ordering is determined by walking parent, sibling up
-the tree all the way to the root.
-
-\begin{verbatim}
-message Request {
- required uint64 index = 1;
- optional uint64 bytes = 2;
- optional bool hash = 3;
- optional uint64 nodes = 4;
-}
-\end{verbatim}
-
-\subsubsection{Cancel}\label{cancel}
-
-Type 8. Cancel a previous Request message that you haven't received yet.
-
-\begin{verbatim}
-message Cancel {
- required uint64 index = 1;
- optional uint64 bytes = 2;
- optional bool hash = 3;
-}
-\end{verbatim}
-
-\subsubsection{Data}\label{data}
-
-Type 9. Sends a single chunk of data to the other peer. You can send it
-in response to a Request or unsolicited on its own as a friendly gift.
-The data includes all of the Merkle tree parent nodes needed to verify
-the hash chain all the way up to the Merkle roots for this chunk.
-Because you can produce the direct parents by hashing the chunk, only
-the roots and `uncle' hashes are included (the siblings to all of the
-parent nodes).
-
-\begin{itemize}
-\tightlist
-\item
- \texttt{index} - The chunk position for this chunk.
-\item
- \texttt{value} - The chunk binary data. Empty if you are sending only
- the hash.
-\item
- \texttt{Node.index} - The index for this chunk in in-order notation
-\item
- \texttt{Node.hash} - The hash of this chunk
-\item
- \texttt{Node.size}- The aggregate chunk size for all children below
- this node (The sum of all chunk sizes of all children)
-\item
- \texttt{signature} - If you are sending a root node, all root nodes
- must have the signature included.
-\end{itemize}
-
-\begin{verbatim}
-message Data {
- required uint64 index = 1;
- optional bytes value = 2;
- repeated Node nodes = 3;
- optional bytes signature = 4;
-
- message Node {
- required uint64 index = 1;
- required bytes hash = 2;
- required uint64 size = 3;
- }
-}
-\end{verbatim}
-
-\section{4. Multi-Writer}\label{multi-writer}
-
-The design of Dat up to this point assumes you have a single keyholder
-writing and signing data and appending it to the metadata and content
-feed. However having the ability for multiple keyholders to be able to
-write to a single repository allows for many interesting use cases such
-as forking and collaborative workflows.
-
-In order to do this, we use one \texttt{metadata.data} feed for each
-writer. Each writer kets their own keypair. Each writer is responsible
-for storing their private key. To add a new writer to your feed, you
-include their key in a metadata feed entry.
-
-For example, if Alice wants to add Bob to have write access to a Dat
-repository, Alice would take Bob's public key and writes it to the
-`local' metadata feed (the feed that Alice owns, e.g.~the original
-feed). Now anyone else who replicates from Alice will find Bob's key in
-the history. If in the future Bob distributes a version of the Dat that
-he added new data to, everyone who has a copy of the Dat from Alice will
-have a copy of Bob's key that they can use to verify that Bob's writes
-are valid.
-
-On disk, each users feed is stored in a separate hyperdrive. The
-original hyperdrive (owned by Alice) is called the `local' hyperdrive.
-Bob's hyperdrive would be stored separately in the SLEEP folder
-addressed by Bob's public key.
-
-In case Bob and Alice write different values for the same file (e.g.~Bob
-creates a ``fork''), when they sync up with each other replication will
-still work, but for the forked value the Dat client will return an array
-of values for that key instead of just one value. The values are linked
-to the writer that wrote them, so in the case of receiving multiple
-values, clients can choose to choose the value from Alice, or Bob, or
-the latest value, or whatever other strategy they prefer.
-
-If a writer updates the value of a forked key with new value they are
-performing a merge.
-
-\section{5. Existing Work}\label{existing-work}
-
-Dat is inspired by a number of features from existing systems.
-
-\subsection{Git}\label{git}
-
-Git popularized the idea of a directed acyclic graph (DAG) combined with
-a Merkle tree, a way to represent changes to data where each change is
-addressed by the secure hash of the change plus all ancestor hashes in a
-graph. This provides a way to trust data integrity, as the only way a
-specific hash could be derived by another peer is if they have the same
-data and change history required to reproduce that hash. This is
-important for reproducibility as it lets you trust that a specific git
-commit hash refers to a specific source code state.
-
-Decentralized version control tools for source code like Git provide a
-protocol for efficiently downloading changes to a set of files, but are
-optimized for text files and have issues with large files. Solutions
-like Git-LFS solve this by using HTTP to download large files, rather
-than the Git protocol. GitHub offers Git-LFS hosting but charges
-repository owners for bandwidth on popular files. Building a distributed
-distribution layer for files in a Git repository is difficult due to
-design of Git Packfiles which are delta compressed repository states
-that do not easily support random access to byte ranges in previous file
-versions.
-
-\subsection{BitTorrent}\label{bittorrent}
-
-BitTorrent implements a swarm based file sharing protocol for static
-datasets. Data is split into fixed sized chunks, hashed, and then that
-hash is used to discover peers that have the same data. An advantage of
-using BitTorrent for dataset transfers is that download bandwidth can be
-fully saturated. Since the file is split into pieces, and peers can
-efficiently discover which pieces each of the peers they are connected
-to have, it means one peer can download non-overlapping regions of the
-dataset from many peers at the same time in parallel, maximizing network
-throughput.
-
-Fixed sized chunking has drawbacks for data that changes. BitTorrent
-assumes all metadata will be transferred up front which makes it
-impractical for streaming or updating content. Most BitTorrent clients
-divide data into 1024 pieces meaning large datasets could have a very
-large chunk size which impacts random access performance (e.g.~for
-streaming video).
-
-Another drawback of BitTorrent is due to the way clients advertise and
-discover other peers in absence of any protocol level privacy or trust.
-From a user privacy standpoint, BitTorrent leaks what users are
-accessing or attempting to access, and does not provide the same
-browsing privacy functions as systems like SSL.
-
-\subsection{Kademlia Distributed Hash
-Table}\label{kademlia-distributed-hash-table}
-
-Kademlia (Maymounkov and Mazieres 2002) is a distributed hash table, a
-distributed key/value store that can serve a similar purpose to DNS
-servers but has no hard coded server addresses. All clients in Kademlia
-are also servers. As long as you know at least one address of another
-peer in the network, you can ask them for the key you are trying to find
-and they will either have it or give you some other people to talk to
-that are more likely to have it.
-
-If you don't have an initial peer to talk to you, most clients use a
-bootstrap server that randomly gives you a peer in the network to start
-with. If the bootstrap server goes down, the network still functions as
-long as other methods can be used to bootstrap new peers (such as
-sending them peer addresses through side channels like how .torrent
-files include tracker addresses to try in case Kademlia finds no peers).
-
-Kademlia is distinct from previous DHT designs due to its simplicity. It
-uses a very simple XOR operation between two keys as its ``distance''
-metric to decide which peers are closer to the data being searched for.
-On paper it seems like it wouldn't work as it doesn't take into account
-things like ping speed or bandwidth. Instead its design is very simple
-on purpose to minimize the amount of control/gossip messages and to
-minimize the amount of complexity required to implement it. In practice
-Kademlia has been extremely successful and is widely deployed as the
-``Mainline DHT'' for BitTorrent, with support in all popular BitTorrent
-clients today.
-
-Due to the simplicity in the original Kademlia design a number of
-attacks such as DDOS and/or sybil have been demonstrated. There are
-protocol extensions (BEPs) which in certain cases mitigate the effects
-of these attacks, such as BEP 44 which includes a DDOS mitigation
-technique. Nonetheless anyone using Kademlia should be aware of the
-limitations.
-
-\subsection{Peer to Peer Streaming Peer Protocol
-(PPSPP)}\label{peer-to-peer-streaming-peer-protocol-ppspp}
-
-PPSPP
-(\href{https://datatracker.ietf.org/doc/rfc7574/?include_text=1}{IETF
-RFC 7574}, (Bakker, Petrocco, and Grishchenko 2015)) is a protocol for
-live streaming content over a peer to peer network. In it they define a
-specific type of Merkle Tree that allows for subsets of the hashes to be
-requested by a peer in order to reduce the time-till-playback for end
-users. BitTorrent for example transfers all hashes up front, which is
-not suitable for live streaming.
-
-Their Merkle trees are ordered using a scheme they call ``bin
-numbering'', which is a method for deterministically arranging an
-append-only log of leaf nodes into an in-order layout tree where
-non-leaf nodes are derived hashes. If you want to verify a specific
-node, you only need to request its sibling's hash and all its uncle
-hashes. PPSPP is very concerned with reducing round trip time and
-time-till-playback by allowing for many kinds of optimizations, such as
-to pack as many hashes into datagrams as possible when exchanging tree
-information with peers.
-
-Although PPSPP was designed with streaming video in mind, the ability to
-request a subset of metadata from a large and/or streaming dataset is
-very desirable for many other types of datasets.
-
-\subsection{WebTorrent}\label{webtorrent}
-
-With WebRTC, browsers can now make peer to peer connections directly to
-other browsers. BitTorrent uses UDP sockets which aren't available to
-browser JavaScript, so can't be used as-is on the Web.
-
-WebTorrent implements the BitTorrent protocol in JavaScript using WebRTC
-as the transport. This includes the BitTorrent block exchange protocol
-as well as the tracker protocol implemented in a way that can enable
-hybrid nodes, talking simultaneously to both BitTorrent and WebTorrent
-swarms (if a client is capable of making both UDP sockets as well as
-WebRTC sockets, such as Node.js). Trackers are exposed to web clients
-over HTTP or WebSockets.
-
-\subsection{InterPlanetary File
-System}\label{interplanetary-file-system}
-
-IPFS is a family of application and network protocols that have peer to
-peer file sharing and data permanence baked in. IPFS abstracts network
-protocols and naming systems to provide an alternative application
-delivery platform to today's Web. For example, instead of using HTTP and
-DNS directly, in IPFS you would use LibP2P streams and IPNS in order to
-gain access to the features of the IPFS platform.
-
-\subsection{Certificate Transparency/Secure
-Registers}\label{certificate-transparencysecure-registers}
-
-The UK Government Digital Service have developed the concept of a
-register which they define as a digital public ledger you can trust. In
-the UK government registers are beginning to be piloted as a way to
-expose essential open data sets in a way where consumers can verify the
-data has not been tampered with, and allows the data publishers to
-update their data sets over time.
-
-The design of registers was inspired by the infrastructure backing the
-Certificate Transparency (Laurie, Langley, and Kasper 2013) project,
-initiated at Google, which provides a service on top of SSL certificates
-that enables service providers to write certificates to a distributed
-public ledger. Any client or service provider can verify if a
-certificate they received is in the ledger, which protects against so
-called ``rogue certificates''.
-
-\section{6. Reference Implementation}\label{reference-implementation}
-
-The connection logic is implemented in a module called
-\href{https://www.npmjs.com/package/discovery-swarm}{discovery-swarm}.
-This builds on discovery-channel and adds connection establishment,
-management and statistics. It provides statistics such as how many
-sources are currently connected, how many good and bad behaving sources
-have been talked to, and it automatically handles connecting and
-reconnecting to sources. UTP support is implemented in the module
-\href{https://www.npmjs.com/package/utp-native}{utp-native}.
-
-Our implementation of source discovery is called
-\href{https://npmjs.org/discovery-channel}{discovery-channel}. We also
-run a \href{https://www.npmjs.com/package/dns-discovery}{custom DNS
-server} that Dat clients use (in addition to specifying their own if
-they need to), as well as a
-\href{https://github.com/bittorrent/bootstrap-dht}{DHT bootstrap}
-server. These discovery servers are the only centralized infrastructure
-we need for Dat to work over the Internet, but they are redundant,
-interchangeable, never see the actual data being shared, anyone can run
-their own and Dat will still work even if they all are unavailable. If
-this happens discovery will just be manual (e.g.~manually sharing
-IP/ports).
-
-\section{Acknowledgements}\label{acknowledgements}
-
-This work was made possible through grants from the John S. and James L.
-Knight and Alfred P. Sloan Foundations.
-
-\section*{References}\label{references}
-\addcontentsline{toc}{section}{References}
-
-\hypertarget{refs}{}
-\hypertarget{ref-aumasson2013blake2}{}
-Aumasson, Jean-Philippe, Samuel Neves, Zooko Wilcox-O'Hearn, and
-Christian Winnerlein. 2013. ``BLAKE2: Simpler, Smaller, Fast as Md5.''
-In \emph{International Conference on Applied Cryptography and Network
-Security}, 119--35. Springer.
-
-\hypertarget{ref-bakker2015peer}{}
-Bakker, A, R Petrocco, and V Grishchenko. 2015. ``Peer-to-Peer Streaming
-Peer Protocol (Ppspp).''
-
-\hypertarget{ref-bernstein2012high}{}
-Bernstein, Daniel J, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin
-Yang. 2012. ``High-Speed High-Security Signatures.'' \emph{Journal of
-Cryptographic Engineering}. Springer, 1--13.
-
-\hypertarget{ref-laurie2013certificate}{}
-Laurie, Ben, Adam Langley, and Emilia Kasper. 2013. ``Certificate
-Transparency.''
-
-\hypertarget{ref-maymounkov2002kademlia}{}
-Maymounkov, Petar, and David Mazieres. 2002. ``Kademlia: A Peer-to-Peer
-Information System Based on the Xor Metric.'' In \emph{International
-Workshop on Peer-to-Peer Systems}, 53--65. Springer.
-
-\hypertarget{ref-mykletun2003providing}{}
-Mykletun, Einar, Maithili Narasimha, and Gene Tsudik. 2003. ``Providing
-Authentication and Integrity in Outsourced Databases Using Merkle Hash
-Trees.'' \emph{UCI-SCONCE Technical Report}.
-
-\hypertarget{ref-sleep}{}
-Ogden, Maxwell, and Mathias Buus. 2017. ``SLEEP - the Dat Protocol on
-Disk Format.'' In.
-
-\hypertarget{ref-rossi2010ledbat}{}
-Rossi, Dario, Claudio Testa, Silvio Valenti, and Luca Muscariello. 2010.
-``LEDBAT: The New Bittorrent Congestion Control Protocol.'' In
-\emph{ICCCN}, 1--6.
-
-\end{document}