aboutsummaryrefslogtreecommitdiffstats log msg author committer range
path: root/papers/sleep.txt
blob: 241347f2057266b77b84fd04153245a292bd8d01 (plain)
\end{verbatim} Each entry in the file is encoded using Protocol Buffers (Varda 2008). The first message we write to the file is of a type called Header which uses this schema: \begin{verbatim} message Header { required string type = 1; optional bytes content = 2; } \end{verbatim} This is used to declare two pieces of metadata used by Dat. It includes a \texttt{type} string with the value \texttt{hyperdrive} and \texttt{content} binary value that holds the public key of the content register that this metadata register represents. When you share a Dat, the metadata key is the main key that gets used, and the content register key is linked from here in the metadata. After the header the file will contain many filesystem \texttt{Node} entries: \begin{verbatim} message Node { required string path = 1; optional Stat value = 2; optional bytes trie = 3; repeated Writer writers = 4; optional uint64 writersSequence = 5; } message Writer { required bytes publicKey = 1; optional string permission = 2; } \end{verbatim} The \texttt{Node} object has five fields \begin{itemize} \tightlist \item \texttt{path} - the string of the absolute file path of this file. \item \texttt{Stat} - a Stat encoded object representing the file metadata \item \texttt{trie} - a compressed list of the sequence numbers as described earlier \item \texttt{writers} - a list of the writers who are allowed to write to this dat \item \texttt{writersSequence} - a reference to the last sequence where the writers array was modified. you can use this to quickly find the value of the writers keys. \end{itemize} The \texttt{trie} value is encoded by starting with the nested array of sequence numbers, e.g. \texttt{{[}{[}{[}0,\ 3{]}{]},\ {[}{[}0,\ 2{]},\ {[}0,\ 1{]}{]}{]}}. Each entry is a tuple where the first item is the index of the feed in the \texttt{writers} array and the second value is the sequence number. Finally you prepend the trie value with a version number varint. To write these subarrays we use variable width integers (varints), using a repeating pattern like this, one for each array: \begin{verbatim} \end{verbatim} This encoding is designed for efficiency as it reduces the filesystem path + feed index metadata down to a series of small integers. The \texttt{Stat} objects use this encoding: \begin{verbatim} message Stat { required uint32 mode = 1; optional uint32 uid = 2; optional uint32 gid = 3; optional uint64 size = 4; optional uint64 blocks = 5; optional uint64 offset = 6; optional uint64 byteOffset = 7; optional uint64 mtime = 8; optional uint64 ctime = 9; } \end{verbatim} These are the field definitions: \begin{itemize} \tightlist \item \texttt{mode} - POSIX file mode bitmask \item \texttt{uid} - POSIX user id \item \texttt{gid} - POSIX group id \item \texttt{size} - file size in bytes \item \texttt{blocks} - number of data chunks that make up this file \item \texttt{offset} - the data feed entry index for the first chunk in this file \item \texttt{byteOffset} - the data feed file byte offset for the first chunk in this file \item \texttt{mtime} - POSIX modified\_at time \item \texttt{mtime} - POSIX created\_at time \end{itemize} \hypertarget{refs}{} \hypertarget{ref-varda2008protocol}{} Varda, Kenton. 2008. Protocol Buffers: Google's Data Interchange Format.'' \emph{Google Open Source Blog, Available at Least as Early as Jul}. \end{document} `