From 6c6b4855d7adc8c03dc8fd8c8966da48878b5c4b Mon Sep 17 00:00:00 2001 From: Max Ogden Date: Fri, 11 Aug 2017 18:44:16 -0700 Subject: sleep paper breakout --- papers/dat-paper.txt | 745 ++++++--------------------------------------------- 1 file changed, 75 insertions(+), 670 deletions(-) (limited to 'papers/dat-paper.txt') diff --git a/papers/dat-paper.txt b/papers/dat-paper.txt index ee310f8..f0e71c2 100644 --- a/papers/dat-paper.txt +++ b/papers/dat-paper.txt @@ -113,8 +113,9 @@ reference implementation is available from npm as \texttt{npm\ install\ dat\ -g}. The protocol is agnostic to the underlying transport e.g.~you could -implement Dat over carrier pigeon. The key properties of the Dat design -are explained in this section. +implement Dat over carrier pigeon. Data is stored in a format called +SLEEP (Ogden and Buus 2017), described in it's own paper. The key +properties of the Dat design are explained in this section. \begin{itemize} \tightlist @@ -130,6 +131,9 @@ are explained in this section. \item 2.4 \textbf{Incremental Versioning} - Datasets can be efficiently synced, even in real time, to other peers. +\item + 2.5 \textbf{Random Access} - Huge file hierarchies can be efficiently + traversed remotely. \end{itemize} \subsection{2.1 Content Integrity}\label{content-integrity} @@ -573,7 +577,7 @@ Merkle tree (the content register): 5 - hash(4 + 6) 6 - hash(cat-1) 8 - hash(cat-2) - 9 - hash() + 9 - hash(8 + 10) 10 - hash(cat-3) \end{verbatim} @@ -630,693 +634,95 @@ three \texttt{Data} messages with the data for each block, as well as the hashes needed to verify the content in a way similar to the process described above for the metadata feed. -\subsection{3. SLEEP Specification}\label{sleep-specification} - -This section is a technical description of the SLEEP format intended for -implementers. SLEEP is the the on-disk format that Dat produces and -uses. It is a set of 9 files that hold all of the metadata needed to -list the contents of a Dat repository and verify the integrity of the -data you receive. SLEEP is designed to work with REST, allowing servers -to be plain HTTP file servers serving the static SLEEP files, meaning -you can implement a Dat protocol client using HTTP with a static HTTP -file server as the backend. - -SLEEP files contain metadata about the data inside a Dat repository, -including cryptographic hashes, cryptographic signatures, filenames and -file permissions. The SLEEP format is specifically engineered to allow -efficient access to subsets of the metadata and/or data in the -repository, even on very large repositories, which enables Dat's peer to -peer networking to be fast. - -The acronym SLEEP is a slumber related pun on REST and stands for -Syncable Lightweight Event Emitting Persistence. The Event Emitting part -refers to how SLEEP files are append-only in nature, meaning they grow -over time and new updates can be subscribed to as a realtime feed of -events through the Dat protocol. - -The SLEEP version described here, used in Dat as of 2017 is SLEEP V2. -SLEEP V1 is documented at http://specs.okfnlabs.org/sleep. - -\subsubsection{SLEEP Files}\label{sleep-files} - -SLEEP is a set of 9 files that should be stored with the following -names. In Dat, the files are stored in a folder called \texttt{.dat} in -the top level of the repository. - -\begin{verbatim} -metadata.key -metadata.signatures -metadata.bitfield -metadata.tree -metadata.data -content.key -content.signatures -content.bitfield -content.tree -\end{verbatim} - -The files prefixed with \texttt{content} store metadata about the -primary data in a Dat repository, for example the raw binary contents of -the files. The files prefixed with \texttt{metadata} store metadata -about the files in the repository, for example the filenames, file -sizes, and file permissions. The \texttt{content} and \texttt{metadata} -files are both Hypercore registers, making SLEEP a set of two Hypercore -registers. - -\subsubsection{SLEEP File Headers}\label{sleep-file-headers} +\subsection{2.5 Random Access}\label{random-access} -The following structured binary format is used for \texttt{signatures}, -\texttt{bitfield}, and \texttt{tree} files. The header contains metadata -as well as information needed to decode the rest of the files after the -header. SLEEP files are designed to be easy to append new data, easy to -read arbitrary byte offsets in the middle, and are relatively flat, -simple files that rely on the filesystem for the heavy lifting. - -SLEEP files are laid out like this: - -\begin{verbatim} -<32 byte header> - - - - -\end{verbatim} +Dat pursues the following access capabilities: \begin{itemize} \tightlist \item - 32 byte header -\item - 4 bytes - magic byte (value varies depending on which file, used to - quickly identify which file type it is) -\item - 1 byte - version number of the file header protocol, current version - is 0 -\item - 2 byte Uint16BE - entry size, describes how long each entry in the - file is -\item - 1 byte - length prefix for body -\item - rest of 32 byte header - string describing key algorithm (in dat - `ed25519'). length of this string matches the length in the previous - length prefix field. This string must fit within the 32 byte header - limitation (24 bytes reserved for string). Unused bytes should be - filled with zeroes. -\end{itemize} - -Possible values in the Dat implementation for the body field are: - -\begin{verbatim} -Ed25519 -BLAKE2b -\end{verbatim} - -To calculate the offset of some entry position, first read the header -and get the entry size, then do -\texttt{32\ +\ entrySize\ *\ entryIndex}. To calculate how many entries -are in a file, you can use the entry size and the filesize on disk and -do \texttt{(fileSize\ -\ 32)\ /\ entrySize}. - -As mentioned above, \texttt{signatures}, \texttt{bitfield} and -\texttt{tree} are the three SLEEP files. There are two additional files, -\texttt{key}, and \texttt{data}, which do not contain SLEEP file headers -and store plain serialized data for easy access. \texttt{key} stores the -public key that is described by the \texttt{signatures} file, and -\texttt{data} stores the raw chunk data that the \texttt{tree} file -contains the hashes and metadata for. - -\subsubsection{File Descriptions}\label{file-descriptions} - -\paragraph{key}\label{key} - -The public key used to verify the signatures in the \texttt{signatures} -file, stored in binary as a single buffer written to disk. To find out -what format of key is stored in this file, read the header of -\texttt{signatures}. In Dat, it's always a ed25519 public key, but other -implementations can specify other key types using a string value in that -header. - -\paragraph{tree}\label{tree} - -A SLEEP formatted 32 byte header with data entries representing a -serialized Merkle tree based on the data in the data storage layer. All -the fixed size nodes written in in-order tree notation. The header -algorithm string for \texttt{tree} files is \texttt{BLAKE2b}. The entry -size is 40 bytes. Entries are formatted like this: - -\begin{verbatim} -<32 byte header> - <4 byte magic string: 0x05025702> - <1 byte version number: 0> - <2 byte entry size: 40> - <1 byte algorithm name length prefix: 7> - <7 byte algorithm name: BLAKE2b> - <17 zeroes> -<40 byte entries> - <32 byte BLAKE2b hash> - <8 byte Uint64BE children leaf byte length> -\end{verbatim} - -The children leaf byte length is the byte size containing the sum byte -length of all leaf nodes in the tree below this node. - -This file uses the in-order notation, meaning even entries are leaf -nodes and odd entries are parent nodes (non-leaf). - -To prevent pre-image attacks, all hashes start with a one byte type -descriptor: - -\begin{verbatim} -0 - LEAF -1 - PARENT -2 - ROOT -\end{verbatim} - -To calculate leaf node entries (the hashes of the data entries) we hash -this data: - -\begin{verbatim} -BLAKE2b( - <1 byte type> - 0 - <8 bytes Uint64BE> - length of entry data - -) -\end{verbatim} - -Then we take this 32 byte hash and write it to the tree as 40 bytes like -this: - -\begin{verbatim} -<32 bytes> - BLAKE2b hash -<8 bytes Uint64BE> - length of data -\end{verbatim} - -Note that the Uint64 of length of data is included both in the hashed -data and written at the end of the entry. This is to expose more -metadata to Dat for advanced use cases such as verifying data length in -sparse replication scenarios. - -To calculate parent node entries (the hashes of the leaf nodes) we hash -this data: - -\begin{verbatim} -BLAKE2b( - <1 byte> - 1 - <8 bytes Uint64BE> - left child length + right child length - <32 bytes> - left child hash - <32 bytes> - right child hash -) -\end{verbatim} - -Then we take this 32 byte hash and write it to the tree as 40 bytes like -this: - -\begin{verbatim} -<32 bytes> - BLAKE2b hash -<8 bytes Uint64BE> - left child length + right child length -\end{verbatim} - -The reason the tree entries contain data lengths is to allow for sparse -mode replication. Encoding lengths (and including lengths in all hashes) -means you can verify the Merkle subtrees independent of the rest of the -tree, which happens during sparse replication scenarios. - -The tree file corresponds directly to the \texttt{data} file. - -\paragraph{data}\label{data} - -The \texttt{data} file is only included in the SLEEP format for the -\texttt{metadata.*} prefixed files which contains filesystem metadata -and not actual file data. For the \texttt{content.*} files, the data is -stored externally (in Dat it is stored as normal files on the filesystem -and not in a SLEEP file). However you can configure Dat to use a -\texttt{content.data} file if you want and it will still work. If you -want to store the full history of all versions of all files, using the -\texttt{content.data} file would provide that guarantee, but would have -the disadvantage of storing files as chunks merged into one huge file -(not as user friendly). - -The \texttt{data} file does not contain a SLEEP file header. It just -contains a bunch of concatenated data entries. Entries are written in -the same order as they appear in the \texttt{tree} file. To read a -\texttt{data} file, first decode the \texttt{tree} file and for every -leaf in the \texttt{tree} file you can calculate a data offset for the -data described by that leaf node in the \texttt{data} file. - -\subparagraph{Index Lookup}\label{index-lookup} - -For example, if we wanted to seek to a specific entry offset (say entry -42): - -\begin{itemize} -\tightlist + Support large file hierachies (millions of files in a single + repository). \item - First, read the header of the \texttt{tree} file and get the entry - size, then do \texttt{32\ +\ entrySize\ *\ 42} to get the raw tree - index: \texttt{32\ +\ (40\ *\ 42)} + Support efficient traversal of the hierarchy (listing files in + arbitrary folders efficiently). \item - Since we want the leaf entry (even node in the in-order layout), we - multiply the entry index by 2: \texttt{32\ +\ (40\ *\ (42\ *\ 2))} + Store all changes to all files (metadata and/or content). \item - Read the 40 bytes at that offset in the \texttt{tree} file to get the - leaf node entry. + List all changes made to any single file. \item - Read the last 8 bytes of the entry to get the length of the data entry + View the state of all files relative to any point in time. \item - To calculate the offset of where in the \texttt{data} file your entry - begins, you need to sum all the lengths of all the earlier entries in - the tree. The most efficient way to do this is to sum all the previous - parent node (non-leaf) entry lengths. You can also sum all leaf node - lengths, but parent nodes contain the sum of their children's lengths - so it's more efficient to use parents. During Dat replication, these - nodes are fetched as part of the Merkle tree verification so you will - already have them locally. This is a log(N) operation where N is the - entry index. Entries are also small and therefore easily cacheable. + Subscribe live to all changes (any file). \item - Once you get the offset, you use the length you decoded above and read - N bytes (where N is the decoded length) at the offset in the - \texttt{data} file. You can verify the data integrity using the 32 - byte hash from the \texttt{tree} entry. -\end{itemize} - -\subparagraph{Byte Lookup}\label{byte-lookup} - -The above method illustrates how to resolve a chunk position index to a -byte offset. You can also do the reverse operation, resolving a byte -offset to a chunk position index. This is used to stream arbitrary -random access regions of files in sparse replication scenarios. - -\begin{itemize} -\tightlist + Subscribe live to changes to files under a specific path. \item - First, you start by calculating the current Merkle roots + Efficiently access any byte range of any version of any file. \item - Each node in the tree (including these root nodes) stores the - aggregate file size of all byte sizes of the nodes below it. So the - roots cumulatively will describe all possible byte ranges for this - repository. + Allow all of the above to happen remotely, only syncing the minimum + metadata necessary to perform any action. \item - Find the root that contains the byte range of the offset you are - looking for and get the node information for all of that nodes - children using the Index Lookup method, and recursively repeat this - step until you find the lowest down child node that describes this - byte range. + Allow efficient comparison of remote and local repository state to + request missing pieces during synchronization. \item - The chunk described by this child node will contain the byte range you - are looking for. You can use the \texttt{byteOffset} property in the - \texttt{Stat} metadata object to seek into the right position in the - content for the start of this chunk. + Allow entire remote archive to be synchronized, or just some subset of + files and/or versions. \end{itemize} -\subparagraph{Metadata Overhead}\label{metadata-overhead} +The way Dat accomplishes these is through a combination of storing all +changes in Hypercore feeds, but also using strategic metadata indexing +strategies that support certain queries efficiently to be performed by +traversing the Hypercore feeds. The protocol itself is specified in +Section 3 (SLEEP), but a scenario based summary follows here. -Using this scheme, if you write 4GB of data using on average 64KB data -chunks (note: chunks can be variable length and do not need to be the -same size), your tree file will be around 5MB (0.0125\% overhead). +\subsubsection{Scenario: Reading a file from a specific byte +offset}\label{scenario-reading-a-file-from-a-specific-byte-offset} -\paragraph{signatures}\label{signatures} +Alice has a dataset in Dat, Bob wants to access a 100MB CSV called +\texttt{cat\_dna.csv} stored in the remote repository, but only wants to +access the 10MB range of the CSV spanning from 30MB - 40MB. -A SLEEP formatted 32 byte header with data entries being 64 byte -signatures. - -\begin{verbatim} -<32 byte header> - <4 byte magic string: 0x05025701> - <1 byte version number: 0> - <2 byte entry size: 64> - <1 byte algorithm name length prefix: 7> - <7 byte algorithm name: Ed25519> - <17 zeroes> -<64 byte entries> - <64 byte Ed25519 signature> -\end{verbatim} +Bob has never communicated with Alice before, and is starting fresh with +no knowledge of this Dat repository other than that he knows he wants +\texttt{cat\_dna.csv} at a specific offset. -Every time the tree is updated we sign the current roots of the Merkle -tree, and append them to the signatures file. The signatures file starts -with no entries. Each time a new leaf is appended to the \texttt{tree} -file (aka whenever data is added to a Dat), we take all root hashes at -the current state of the Merkle tree and hash and sign them, then append -them as a new entry to the signatures file. +First, Bob asks Alice through the Dat protocol for the metadata he needs +to resolve \texttt{cat\_dna.csv} to the correct metadata feed entry that +represents the file he wants. Note: In this scenario we assume Bob wants +the latest version of \texttt{cat\_dna.csv}. It is also possible to do +this for a specific older version. -\begin{verbatim} -Ed25519 sign( - BLAKE2b( - <1 byte> - 2 // root type - for (every root node left-to-right) { - <32 byte root hash> - <8 byte Uint64BE root tree index> - <8 byte Uint64BE child byte lengths> - } - ) -) -\end{verbatim} +Bob first sends a \texttt{Request} message for the latest entry in the +metadata feed. Alice responds. Bob looks at the \texttt{trie} value, and +using the lookup algorithm described below sends another +\texttt{Request} message for the metadata node that is closer to the +filename he is looking for. This repeats until Alice sends Bob the +matching metadata entry. This is the un-optimized resolution that uses +\texttt{log(n)} round trips, though there are ways to optimize this by +having Alice send additional sequence numbers to Bob that help him +traverse in less round trips. -The reason we hash all the root nodes is that the BLAKE2b hash above is -only calculable if you have all of the pieces of data required to -generate all the intermediate hashes. This is the crux of Dat's data -integrity guarantees. +In the metadata record Bob recieved for \texttt{cat\_dna.csv} there is +the byte offset to the beginning of the file in the data feed. Bob adds +his +30MB offset to this value and starts requesting pieces of data +starting at that byte offset using the SLEEP protocol as described +below. -\paragraph{bitfield}\label{bitfield} +This method tries to allow any byte range of any file to be accessed +without the need to synchronize the full metadata for all files up +front. -A SLEEP formatted 32 byte header followed by a series of 3328 byte long -entries. +\subsubsection{Scenario: Syncing live changes to files at a specific +path}\label{scenario-syncing-live-changes-to-files-at-a-specific-path} -\begin{verbatim} -<32 byte header> - <4 byte magic string: 0x05025700> - <1 byte version number: 0> - <2 byte entry size: 3328> - <1 byte algorithm name length: 0> - <1 byte algorithm name: 0> - <24 zeroes> -<3328 byte entries> // (2048 + 1024 + 256) -\end{verbatim} +TODO -The bitfield describes which pieces of data you have, and which nodes in -the \texttt{tree} file have been written. This file exists as an index -of the \texttt{tree} and \texttt{data} to quickly figure out which -pieces of data you have or are missing. This file can be regenerated if -you delete it, so it is considered a materialized index. +\subsubsection{Scenario: Syncing an entire +archive}\label{scenario-syncing-an-entire-archive} -The \texttt{bitfield} file actually contains three bitfields of -different sizes. A bitfield (AKA bitmap) is defined as a set of bits -where each bit (0 or 1) represents if you have or do not have a piece of -data at that bit index. So if there is a dataset of 10 cat pictures, and -you have pictures 1, 3, and 5 but are missing the rest, your bitfield -would look like \texttt{1010100000}. +TODO -Each entry contains three objects: - -\begin{itemize} -\tightlist -\item - Data Bitfield (1024 bytes) - 1 bit for for each data entry that you - have synced (1 for every entry in \texttt{data}). -\item - Tree Bitfield (2048 bytes) - 1 bit for every tree entry (all nodes in - \texttt{tree}) -\item - Bitfield Index (256 bytes) - This is an index of the Data Bitfield - that makes it efficient to figure out which pieces of data are missing - from the Data Bitfield without having to do a linear scan. -\end{itemize} - -The Data Bitfield is 1Kb somewhat arbitrarily, but the idea is that -because most filesystems work in 4Kb chunk sizes, we can fit the Data, -Tree and Index in less then 4Kb of data for efficient writes to the -filesystem. The Tree and Index sizes are based on the Data size (the -Tree has twice the entries as the Data, odd and even nodes vs just even -nodes in \texttt{tree}, and Index is always 1/4th the size). - -To generate the Index, you take pairs of 2 bytes at a time from the Data -Bitfield, check if all bits in the 2 bytes are the same, and generate 4 -bits of Index metadata~for every 2 bytes of Data (hence how 1024 bytes -of Data ends up as 256 bytes of Index). - -First you generate a 2 bit tuple for the 2 bytes of Data: - -\begin{verbatim} -if (data is all 1's) then [1,1] -if (data is all 0's) then [0,0] -if (data is not all the same) then [1, 0] -\end{verbatim} - -The Index itself is an in-order binary tree, not a traditional bitfield. -To generate the tree, you take the tuples you generate above and then -write them into a tree like the following example, where non-leaf nodes -are generated using the above scheme by looking at the results of the -relative even child tuples for each odd parent tuple: - -\begin{verbatim} -// for e.g. 16 bytes (8 tuples) of -// sparsely replicated data -0 - [00 00 00 00] -1 - [10 10 10 10] -2 - [11 11 11 11] -\end{verbatim} - -The tuples at entry \texttt{1} above are \texttt{{[}1,0{]}} because the -relative child tuples are not uniform. In the following example, all -non-leaf nodes are \texttt{{[}1,1{]}} because their relative children -are all uniform (\texttt{{[}1,1{]}}) - -\begin{verbatim} -// for e.g. 32 bytes (16 tuples) of -// fully replicated data (all 1's) -0 - [11 11 11 11] -1 - [11 11 11 11] -2 - [11 11 11 11] -3 - [11 11 11 11] -4 - [11 11 11 11] -5 - [11 11 11 11] -6 - [11 11 11 11] -\end{verbatim} - -Using this scheme, to represent 32 bytes of data it takes at most 8 -bytes of Index. In this example it compresses nicely as its all -contiguous ones on disk, similarly for an empty bitfield it would be all -zeroes. - -If you write 4GB of data using on average 64KB data chunk size, your -bitfield will be at most 32KB. - -\paragraph{metadata.data}\label{metadata.data} - -This file is used to store content described by the rest of the -\texttt{metadata.*} hypercore SLEEP files. Whereas the -\texttt{content.*} SLEEP files describe the data stored in the actual -data cloned in the Dat repository filesystem, the \texttt{metadata} data -feed is stored inside the \texttt{.dat} folder along with the rest of -the SLEEP files. - -The contents of this file is a series of versions of the Dat filesystem -tree. As this is a hypercore data feed, it's just an append only log of -binary data entries. The challenge is representing a tree in a -one-dimensional way to make it representable as a Hypercore register. -For example, imagine three files: - -\begin{verbatim} -~/dataset $ ls -figures - graph1.png - graph2.png -results.csv - -1 directory, 3 files -\end{verbatim} - -We want to take this structure and map it to a serialized representation -that gets written into an append only log in a way that still allows for -efficient random access by file path. - -To do this, we convert the filesystem metadata into entries in a feed -like this: - -\begin{verbatim} -{ - "path": "/results.csv", - children: [[]], - sequence: 0 -} -{ - "path": "/figures/graph1.png", - children: [[0], []], - sequence: 1 -} -{ - "path": "/figures/graph2.png", - children: [[0], [1]], - sequence: 2 -} -\end{verbatim} - -\subparagraph{Filename Resolution}\label{filename-resolution} - -Each sequence represents adding one of the files to the register, so at -sequence 0 the filesystem state only has a single file, -\texttt{results.csv} in it. At sequence 1, there are only 2 files added -to the register, and at sequence 3 all files are finally added. The -\texttt{children} field represents a shorthand way of declaring which -other files at every level of the directory hierarchy exist alongside -the file being added at that revision. For example at the time of -sequence 1, children is \texttt{{[}{[}0{]},\ {[}{]}{]}}. The first -sub-array, \texttt{{[}0{]}}, represents the first folder in the -\texttt{path}, which is the root folder \texttt{/}. In this case -\texttt{{[}0{]}} means the root folder at this point in time only has a -single file, the file that is the subject of sequence \texttt{0}. The -second subarray is empty \texttt{{[}{]}} because there are no other -existing files in the second folder in the \texttt{path}, -\texttt{figures}. - -To look up a file by filename, you fetch the latest entry in the log, -then use the \texttt{children} metadata in that entry to look up the -longest common ancestor based on the parent folders of the filename you -are querying. You can then recursively repeat this operation until you -find the \texttt{path} entry you are looking for (or you exhaust all -options which means the file does not exist). This is a -\texttt{O(number\ of\ slashes\ in\ your\ path)} operation. - -For example, if you wanted to look up \texttt{/results.csv} given the -above register, you would start by grabbing the metadata at sequence 2. -The longest common ancestor between \texttt{/results.csv} and -\texttt{/figures/graph2} is \texttt{/}. You then grab the corresponding -entry in the children array for \texttt{/}, which in this case is the -first entry, \texttt{{[}0{]}}. You then repeat this with all of the -children entries until you find a child that is closer to the entry you -are looking for. In this example, the first entry happens to be the -match we are looking for. - -You can also perform lookups relative to a point in time by starting -from a specific sequence number in the register. For example to get the -state of some file relative to an old sequence number, similar to -checking out an old version of a repository in Git. - -\subparagraph{Data Serialization}\label{data-serialization} - -The format of the \texttt{metadata.data} file is as follows: - -\begin{verbatim} -
- - - - -\end{verbatim} - -Each entry in the file is encoded using Protocol Buffers (Varda 2008). - -The first message we write to the file is of a type called Header which -uses this schema: - -\begin{verbatim} -message Header { - required string type = 1; - optional bytes content = 2; -} -\end{verbatim} - -This is used to declare two pieces of metadata used by Dat. It includes -a \texttt{type} string with the value \texttt{hyperdrive} and -\texttt{content} binary value that holds the public key of the content -register that this metadata register represents. When you share a Dat, -the metadata key is the main key that gets used, and the content -register key is linked from here in the metadata. - -After the header the file will contain many filesystem \texttt{Node} -entries: - -\begin{verbatim} -message Node { - required string path = 1; - optional Stat value = 2; - optional bytes children = 3; - repeated Writer writers = 4; - optional uint64 writersSequence = 5; -} - -message Writer { - required bytes publicKey = 1; - optional string permission = 2; -} -\end{verbatim} - -The \texttt{Node} object has five fields - -\begin{itemize} -\tightlist -\item - \texttt{path} - the string of the absolute file path of this file. -\item - \texttt{Stat} - a Stat encoded object representing the file metadata -\item - \texttt{children} - a compressed list of the sequence numbers as - described earlier -\item - \texttt{writers} - a list of the writers who are allowed to write to - this dat -\item - \texttt{writersSequence} - a reference to the last sequence where the - writers array was modified. you can use this to quickly find the value - of the writers keys. -\end{itemize} - -The \texttt{children} value is encoded by starting with the nested array -of sequence numbers, e.g. -\texttt{{[}{[}{[}0,\ 3{]}{]},\ {[}{[}0,\ 2{]},\ {[}0,\ 1{]}{]}{]}}. Each -entry is a tuple where the first item is the index of the feed in the -\texttt{writers} array and the second value is the sequence number. -Finally you prepend the children value with a version number varint. - -To write these subarrays we use variable width integers (varints), using -a repeating pattern like this, one for each array: - -\begin{verbatim} - - - - - - -\end{verbatim} - -This encoding is designed for efficiency as it reduces the filesystem -path + feed index metadata down to a series of small integers. - -The \texttt{Stat} objects use this encoding: - -\begin{verbatim} -message Stat { - required uint32 mode = 1; - optional uint32 uid = 2; - optional uint32 gid = 3; - optional uint64 size = 4; - optional uint64 blocks = 5; - optional uint64 offset = 6; - optional uint64 byteOffset = 7; - optional uint64 mtime = 8; - optional uint64 ctime = 9; -} -\end{verbatim} - -These are the field definitions: - -\begin{itemize} -\tightlist -\item - \texttt{mode} - POSIX file mode bitmask -\item - \texttt{uid} - POSIX user id -\item - \texttt{gid} - POSIX group id -\item - \texttt{size} - file size in bytes -\item - \texttt{blocks} - number of data chunks that make up this file -\item - \texttt{offset} - the data feed entry index for the first chunk in - this file -\item - \texttt{byteOffset} - the data feed file byte offset for the first - chunk in this file -\item - \texttt{mtime} - POSIX modified\_at time -\item - \texttt{mtime} - POSIX created\_at time -\end{itemize} - -\subsection{4. Dat Network Protocol}\label{dat-network-protocol} +\subsection{3. Dat Network Protocol}\label{dat-network-protocol} The SLEEP format is designed to allow for sparse replication, meaning you can efficiently download only the metadata and data required to @@ -1565,7 +971,7 @@ message Cancel { } \end{verbatim} -\subsubsection{Data}\label{data-1} +\subsubsection{Data}\label{data} Type 9. Sends a single chunk of data to the other peer. You can send it in response to a Request or unsolicited on its own as a friendly gift. @@ -1609,7 +1015,7 @@ message Data { } \end{verbatim} -\section{5. Multi-Writer}\label{multi-writer} +\section{4. Multi-Writer}\label{multi-writer} The design of Dat up to this point assumes you have a single keyholder writing and signing data and appending it to the metadata and content @@ -1647,7 +1053,7 @@ the latest value, or whatever other strategy they prefer. If a writer updates the value of a forked key with new value they are performing a merge. -\section{6. Existing Work}\label{existing-work} +\section{5. Existing Work}\label{existing-work} Dat is inspired by a number of features from existing systems. @@ -1802,7 +1208,7 @@ public ledger. Any client or service provider can verify if a certificate they received is in the ledger, which protects against so called ``rogue certificates''. -\section{7. Reference Implementation}\label{reference-implementation} +\section{6. Reference Implementation}\label{reference-implementation} The connection logic is implemented in a module called \href{https://www.npmjs.com/package/discovery-swarm}{discovery-swarm}. @@ -1864,14 +1270,13 @@ Mykletun, Einar, Maithili Narasimha, and Gene Tsudik. 2003. ``Providing Authentication and Integrity in Outsourced Databases Using Merkle Hash Trees.'' \emph{UCI-SCONCE Technical Report}. +\hypertarget{ref-sleep}{} +Ogden, Maxwell, and Mathias Buus. 2017. ``SLEEP - the Dat Protocol on +Disk Format.'' In. + \hypertarget{ref-rossi2010ledbat}{} Rossi, Dario, Claudio Testa, Silvio Valenti, and Luca Muscariello. 2010. ``LEDBAT: The New Bittorrent Congestion Control Protocol.'' In \emph{ICCCN}, 1--6. -\hypertarget{ref-varda2008protocol}{} -Varda, Kenton. 2008. ``Protocol Buffers: Google's Data Interchange -Format.'' \emph{Google Open Source Blog, Available at Least as Early as -Jul}. - \end{document} -- cgit v1.2.3