aboutsummaryrefslogtreecommitdiffstats log msg author committer range
path: root/papers/sleep.latex
diff options
 context: 12345678910152025303540 space: includeignore mode: unifiedssdiffstat only
Diffstat (limited to 'papers/sleep.latex')
-rw-r--r--papers/sleep.latex751
1 files changed, 751 insertions, 0 deletions
+++++\end{verbatim}++Each entry in the file is encoded using Protocol Buffers (Varda 2008).++The first message we write to the file is of a type called Header which+uses this schema:++\begin{verbatim}+message Header {+ required string type = 1;+ optional bytes content = 2;+}+\end{verbatim}++This is used to declare two pieces of metadata used by Dat. It includes+a \texttt{type} string with the value \texttt{hyperdrive} and+\texttt{content} binary value that holds the public key of the content+register that this metadata register represents. When you share a Dat,+the metadata key is the main key that gets used, and the content+register key is linked from here in the metadata.++After the header the file will contain many filesystem \texttt{Node}+entries:++\begin{verbatim}+message Node {+ required string path = 1;+ optional Stat value = 2;+ optional bytes trie = 3;+ repeated Writer writers = 4;+ optional uint64 writersSequence = 5;+}++message Writer {+ required bytes publicKey = 1;+ optional string permission = 2;+}+\end{verbatim}++The \texttt{Node} object has five fields++\begin{itemize}+\tightlist+\item+ \texttt{path} - the string of the absolute file path of this file.+\item+ \texttt{Stat} - a Stat encoded object representing the file metadata+\item+ \texttt{trie} - a compressed list of the sequence numbers as described+ earlier+\item+ \texttt{writers} - a list of the writers who are allowed to write to+ this dat+\item+ \texttt{writersSequence} - a reference to the last sequence where the+ writers array was modified. you can use this to quickly find the value+ of the writers keys.+\end{itemize}++The \texttt{trie} value is encoded by starting with the nested array of+sequence numbers, e.g.+\texttt{{[}{[}{[}0,\ 3{]}{]},\ {[}{[}0,\ 2{]},\ {[}0,\ 1{]}{]}{]}}. Each+entry is a tuple where the first item is the index of the feed in the+\texttt{writers} array and the second value is the sequence number.+Finally you prepend the trie value with a version number varint.++To write these subarrays we use variable width integers (varints), using+a repeating pattern like this, one for each array:++\begin{verbatim}+++++++\end{verbatim}++This encoding is designed for efficiency as it reduces the filesystem+path + feed index metadata down to a series of small integers.++The \texttt{Stat} objects use this encoding:++\begin{verbatim}+message Stat {+ required uint32 mode = 1;+ optional uint32 uid = 2;+ optional uint32 gid = 3;+ optional uint64 size = 4;+ optional uint64 blocks = 5;+ optional uint64 offset = 6;+ optional uint64 byteOffset = 7;+ optional uint64 mtime = 8;+ optional uint64 ctime = 9;+}+\end{verbatim}++These are the field definitions:++\begin{itemize}+\tightlist+\item+ \texttt{mode} - POSIX file mode bitmask+\item+ \texttt{uid} - POSIX user id+\item+ \texttt{gid} - POSIX group id+\item+ \texttt{size} - file size in bytes+\item+ \texttt{blocks} - number of data chunks that make up this file+\item+ \texttt{offset} - the data feed entry index for the first chunk in+ this file+\item+ \texttt{byteOffset} - the data feed file byte offset for the first+ chunk in this file+\item+ \texttt{mtime} - POSIX modified\_at time+\item+ \texttt{mtime} - POSIX created\_at time+\end{itemize}++\subsection*{References}\label{references}+\addcontentsline{toc}{subsection}{References}++\hypertarget{refs}{}+\hypertarget{ref-varda2008protocol}{}+Varda, Kenton. 2008. Protocol Buffers: Google's Data Interchange+Format.'' \emph{Google Open Source Blog, Available at Least as Early as+Jul}.++\end{document}