diff options
Diffstat (limited to 'papers')
-rw-r--r-- | papers/PublicBitsKnightFoundationRound2.pdf | 523 | ||||
-rw-r--r-- | papers/dat-paper.md | 135 | ||||
-rw-r--r-- | papers/dat-paper.pdf | bin | 0 -> 197850 bytes |
3 files changed, 658 insertions, 0 deletions
diff --git a/papers/PublicBitsKnightFoundationRound2.pdf b/papers/PublicBitsKnightFoundationRound2.pdf new file mode 100644 index 0000000..655c5c8 --- /dev/null +++ b/papers/PublicBitsKnightFoundationRound2.pdf @@ -0,0 +1,523 @@ + + + + +<!DOCTYPE html> +<html lang="en" class=""> + <head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# object: http://ogp.me/ns/object# article: http://ogp.me/ns/article# profile: http://ogp.me/ns/profile#"> + <meta charset='utf-8'> + + <link crossorigin="anonymous" href="https://assets-cdn.github.com/assets/frameworks-05a9c829cb05c9712d91641ea56a3b8efc2b04b65b5b10083bcde7abb552cbd0.css" media="all" rel="stylesheet" /> + <link crossorigin="anonymous" href="https://assets-cdn.github.com/assets/github-33507d791bd4d413ce1297417af2b07f34e737a8c55b97e1c8c43ebacfb2e756.css" media="all" rel="stylesheet" /> + + + <link crossorigin="anonymous" href="https://assets-cdn.github.com/assets/site-0996ced1a40a04be84d932b2c830830a2c87259cfb5c41c90ca7fee0c5979e9d.css" media="all" rel="stylesheet" /> + + + <link as="script" href="https://assets-cdn.github.com/assets/frameworks-9694cf6e2bb6c831700640aa81f2214a116964d62c735a93522dd3cf7f28d0bd.js" rel="preload" /> + + <link as="script" href="https://assets-cdn.github.com/assets/github-f8beb51311ba00b2b498862037f9e0f930d6ef948e94bda47ba40d686756c5c1.js" rel="preload" /> + + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta http-equiv="Content-Language" content="en"> + <meta name="viewport" content="width=device-width"> + + + <title>splash/PublicBitsKnightFoundationRound2.pdf at master · datland/splash · GitHub</title> + <link rel="search" type="application/opensearchdescription+xml" href="/opensearch.xml" title="GitHub"> + <link rel="fluid-icon" href="https://github.com/fluidicon.png" title="GitHub"> + <link rel="apple-touch-icon" href="/apple-touch-icon.png"> + <link rel="apple-touch-icon" sizes="57x57" href="/apple-touch-icon-57x57.png"> + <link rel="apple-touch-icon" sizes="60x60" href="/apple-touch-icon-60x60.png"> + <link rel="apple-touch-icon" sizes="72x72" href="/apple-touch-icon-72x72.png"> + <link rel="apple-touch-icon" sizes="76x76" href="/apple-touch-icon-76x76.png"> + <link rel="apple-touch-icon" sizes="114x114" href="/apple-touch-icon-114x114.png"> + <link rel="apple-touch-icon" sizes="120x120" href="/apple-touch-icon-120x120.png"> + <link rel="apple-touch-icon" sizes="144x144" href="/apple-touch-icon-144x144.png"> + <link rel="apple-touch-icon" sizes="152x152" href="/apple-touch-icon-152x152.png"> + <link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon-180x180.png"> + <meta property="fb:app_id" content="1401488693436528"> + + <meta content="https://avatars1.githubusercontent.com/u/15161646?v=3&s=400" name="twitter:image:src" /><meta content="@github" name="twitter:site" /><meta content="summary" name="twitter:card" /><meta content="datland/splash" name="twitter:title" /><meta content="simple splash placeholder page before we launch" name="twitter:description" /> + <meta content="https://avatars1.githubusercontent.com/u/15161646?v=3&s=400" property="og:image" /><meta content="GitHub" property="og:site_name" /><meta content="object" property="og:type" /><meta content="datland/splash" property="og:title" /><meta content="https://github.com/datland/splash" property="og:url" /><meta content="simple splash placeholder page before we launch" property="og:description" /> + <meta name="browser-stats-url" content="https://api.github.com/_private/browser/stats"> + <meta name="browser-errors-url" content="https://api.github.com/_private/browser/errors"> + <link rel="assets" href="https://assets-cdn.github.com/"> + + <meta name="pjax-timeout" content="1000"> + + + <meta name="msapplication-TileImage" content="/windows-tile.png"> + <meta name="msapplication-TileColor" content="#ffffff"> + <meta name="selected-link" value="repo_source" data-pjax-transient> + + <meta name="google-site-verification" content="KT5gs8h0wvaagLKAVWq8bbeNwnZZK1r1XQysX3xurLU"> +<meta name="google-site-verification" content="ZzhVyEFwb7w3e0-uOTltm8Jsck2F5StVihD0exw2fsA"> + <meta name="google-analytics" content="UA-3769691-2"> + +<meta content="collector.githubapp.com" name="octolytics-host" /><meta content="github" name="octolytics-app-id" /><meta content="8EFE6B95:732B:1FA8155:5761E37A" name="octolytics-dimension-request_id" /> +<meta content="/<user-name>/<repo-name>/blob/show" data-pjax-transient="true" name="analytics-location" /> + + + + <meta class="js-ga-set" name="dimension1" content="Logged Out"> + + + + <meta name="hostname" content="github.com"> + <meta name="user-login" content=""> + + <meta name="expected-hostname" content="github.com"> + <meta name="js-proxy-site-detection-payload" content="NDBhZjVkMDY5MmZmOWYwNTNiZGM0MWJhMDRjNTcwZmQ1ODMwNzI0OWQxMTc4N2JkODA5ZWZiYTYyYzc5OGU1M3x7InJlbW90ZV9hZGRyZXNzIjoiMTQyLjI1NC4xMDcuMTQ5IiwicmVxdWVzdF9pZCI6IjhFRkU2Qjk1OjczMkI6MUZBODE1NTo1NzYxRTM3QSIsInRpbWVzdGFtcCI6MTQ2NjAzMzAxOX0="> + + + <link rel="mask-icon" href="https://assets-cdn.github.com/pinned-octocat.svg" color="#4078c0"> + <link rel="icon" type="image/x-icon" href="https://assets-cdn.github.com/favicon.ico"> + + <meta name="html-safe-nonce" content="615f08fb06203c7d80af9c53c326be7592072fed"> + <meta content="7da070266e40c9bbf2ea8aee71e94eda0385811e" name="form-nonce" /> + + <meta http-equiv="x-pjax-version" content="9d474982bb228c5fe7c670846e370fe6"> + + + + <meta name="description" content="simple splash placeholder page before we launch"> + <meta name="go-import" content="github.com/datland/splash git https://github.com/datland/splash.git"> + + <meta content="15161646" name="octolytics-dimension-user_id" /><meta content="datland" name="octolytics-dimension-user_login" /><meta content="50209409" name="octolytics-dimension-repository_id" /><meta content="datland/splash" name="octolytics-dimension-repository_nwo" /><meta content="true" name="octolytics-dimension-repository_public" /><meta content="false" name="octolytics-dimension-repository_is_fork" /><meta content="50209409" name="octolytics-dimension-repository_network_root_id" /><meta content="datland/splash" name="octolytics-dimension-repository_network_root_nwo" /> + <link href="https://github.com/datland/splash/commits/master.atom" rel="alternate" title="Recent Commits to splash:master" type="application/atom+xml"> + + + <link rel="canonical" href="https://github.com/datland/splash/blob/master/PublicBitsKnightFoundationRound2.pdf" data-pjax-transient> + </head> + + + <body class="logged-out env-production vis-public page-blob"> + <div id="js-pjax-loader-bar" class="pjax-loader-bar"></div> + <a href="#start-of-content" tabindex="1" class="accessibility-aid js-skip-to-content">Skip to content</a> + + + + + + + + <header class="site-header js-details-container" role="banner"> + <div class="container-responsive"> + <a class="header-logo-invertocat" href="https://github.com/" aria-label="Homepage" data-ga-click="(Logged out) Header, go to homepage, icon:logo-wordmark"> + <svg aria-hidden="true" class="octicon octicon-mark-github" height="32" version="1.1" viewBox="0 0 16 16" width="32"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path></svg> + </a> + + <button class="btn-link right site-header-toggle js-details-target" type="button" aria-label="Toggle navigation"> + <svg aria-hidden="true" class="octicon octicon-three-bars" height="24" version="1.1" viewBox="0 0 12 16" width="18"><path d="M11.41 9H.59C0 9 0 8.59 0 8c0-.59 0-1 .59-1H11.4c.59 0 .59.41.59 1 0 .59 0 1-.59 1h.01zm0-4H.59C0 5 0 4.59 0 4c0-.59 0-1 .59-1H11.4c.59 0 .59.41.59 1 0 .59 0 1-.59 1h.01zM.59 11H11.4c.59 0 .59.41.59 1 0 .59 0 1-.59 1H.59C0 13 0 12.59 0 12c0-.59 0-1 .59-1z"></path></svg> + </button> + + <div class="site-header-menu"> + <nav class="site-header-nav site-header-nav-main"> + <a href="/personal" class="js-selected-navigation-item nav-item nav-item-personal" data-ga-click="Header, click, Nav menu - item:personal" data-selected-links="/personal /personal"> + Personal +</a> <a href="/open-source" class="js-selected-navigation-item nav-item nav-item-opensource" data-ga-click="Header, click, Nav menu - item:opensource" data-selected-links="/open-source /open-source"> + Open source +</a> <a href="/business" class="js-selected-navigation-item nav-item nav-item-business" data-ga-click="Header, click, Nav menu - item:business" data-selected-links="/business /business/features /business/customers /business"> + Business +</a> <a href="/explore" class="js-selected-navigation-item nav-item nav-item-explore" data-ga-click="Header, click, Nav menu - item:explore" data-selected-links="/explore /trending /trending/developers /integrations /integrations/feature/code /integrations/feature/collaborate /integrations/feature/ship /explore"> + Explore +</a> </nav> + + <div class="site-header-actions"> + <a class="btn btn-primary site-header-actions-btn" href="/join?source=header-repo" data-ga-click="(Logged out) Header, clicked Sign up, text:sign-up">Sign up</a> + <a class="btn site-header-actions-btn mr-2" href="/login?return_to=%2Fdatland%2Fsplash%2Fblob%2Fmaster%2FPublicBitsKnightFoundationRound2.pdf" data-ga-click="(Logged out) Header, clicked Sign in, text:sign-in">Sign in</a> + </div> + + <nav class="site-header-nav site-header-nav-secondary"> + <a class="nav-item" href="/pricing">Pricing</a> + <a class="nav-item" href="/blog">Blog</a> + <a class="nav-item" href="https://help.github.com">Support</a> + <a class="nav-item header-search-link" href="https://github.com/search">Search GitHub</a> + <div class="header-search scoped-search site-scoped-search js-site-search" role="search"> + <!-- </textarea> --><!-- '"` --><form accept-charset="UTF-8" action="/datland/splash/search" class="js-site-search-form" data-scoped-search-url="/datland/splash/search" data-unscoped-search-url="/search" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="✓" /></div> + <label class="form-control header-search-wrapper js-chromeless-input-container"> + <div class="header-search-scope">This repository</div> + <input type="text" + class="form-control header-search-input js-site-search-focus js-site-search-field is-clearable" + data-hotkey="s" + name="q" + placeholder="Search" + aria-label="Search this repository" + data-unscoped-placeholder="Search GitHub" + data-scoped-placeholder="Search" + tabindex="1" + autocapitalize="off"> + </label> +</form></div> + + </nav> + </div> + </div> +</header> + + + + <div id="start-of-content" class="accessibility-aid"></div> + + <div id="js-flash-container"> +</div> + + + <div role="main" class="main-content"> + <div itemscope itemtype="http://schema.org/SoftwareSourceCode"> + <div id="js-repo-pjax-container" data-pjax-container> + +<div class="pagehead repohead instapaper_ignore readability-menu experiment-repo-nav"> + <div class="container repohead-details-container"> + + + +<ul class="pagehead-actions"> + + <li> + <a href="/login?return_to=%2Fdatland%2Fsplash" + class="btn btn-sm btn-with-count tooltipped tooltipped-n" + aria-label="You must be signed in to watch a repository" rel="nofollow"> + <svg aria-hidden="true" class="octicon octicon-eye" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path d="M8.06 2C3 2 0 8 0 8s3 6 8.06 6C13 14 16 8 16 8s-3-6-7.94-6zM8 12c-2.2 0-4-1.78-4-4 0-2.2 1.8-4 4-4 2.22 0 4 1.8 4 4 0 2.22-1.78 4-4 4zm2-4c0 1.11-.89 2-2 2-1.11 0-2-.89-2-2 0-1.11.89-2 2-2 1.11 0 2 .89 2 2z"></path></svg> + Watch + </a> + <a class="social-count" href="/datland/splash/watchers"> + 4 + </a> + + </li> + + <li> + <a href="/login?return_to=%2Fdatland%2Fsplash" + class="btn btn-sm btn-with-count tooltipped tooltipped-n" + aria-label="You must be signed in to star a repository" rel="nofollow"> + <svg aria-hidden="true" class="octicon octicon-star" height="16" version="1.1" viewBox="0 0 14 16" width="14"><path d="M14 6l-4.9-.64L7 1 4.9 5.36 0 6l3.6 3.26L2.67 14 7 11.67 11.33 14l-.93-4.74z"></path></svg> + Star + </a> + + <a class="social-count js-social-count" href="/datland/splash/stargazers"> + 0 + </a> + + </li> + + <li> + <a href="/login?return_to=%2Fdatland%2Fsplash" + class="btn btn-sm btn-with-count tooltipped tooltipped-n" + aria-label="You must be signed in to fork a repository" rel="nofollow"> + <svg aria-hidden="true" class="octicon octicon-repo-forked" height="16" version="1.1" viewBox="0 0 10 16" width="10"><path d="M8 1a1.993 1.993 0 0 0-1 3.72V6L5 8 3 6V4.72A1.993 1.993 0 0 0 2 1a1.993 1.993 0 0 0-1 3.72V6.5l3 3v1.78A1.993 1.993 0 0 0 5 15a1.993 1.993 0 0 0 1-3.72V9.5l3-3V4.72A1.993 1.993 0 0 0 8 1zM2 4.2C1.34 4.2.8 3.65.8 3c0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2zm3 10c-.66 0-1.2-.55-1.2-1.2 0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2zm3-10c-.66 0-1.2-.55-1.2-1.2 0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2z"></path></svg> + Fork + </a> + + <a href="/datland/splash/network" class="social-count"> + 1 + </a> + </li> +</ul> + + <h1 class="public "> + <svg aria-hidden="true" class="octicon octicon-repo" height="16" version="1.1" viewBox="0 0 12 16" width="12"><path d="M4 9H3V8h1v1zm0-3H3v1h1V6zm0-2H3v1h1V4zm0-2H3v1h1V2zm8-1v12c0 .55-.45 1-1 1H6v2l-1.5-1.5L3 16v-2H1c-.55 0-1-.45-1-1V1c0-.55.45-1 1-1h10c.55 0 1 .45 1 1zm-1 10H1v2h2v-1h3v1h5v-2zm0-10H2v9h9V1z"></path></svg> + <span class="author" itemprop="author"><a href="/datland" class="url fn" rel="author">datland</a></span><!-- +--><span class="path-divider">/</span><!-- +--><strong itemprop="name"><a href="/datland/splash" data-pjax="#js-repo-pjax-container">splash</a></strong> + +</h1> + + </div> + <div class="container"> + +<nav class="reponav js-repo-nav js-sidenav-container-pjax" + itemscope + itemtype="http://schema.org/BreadcrumbList" + role="navigation" + data-pjax="#js-repo-pjax-container"> + + <span itemscope itemtype="http://schema.org/ListItem" itemprop="itemListElement"> + <a href="/datland/splash" aria-selected="true" class="js-selected-navigation-item selected reponav-item" data-hotkey="g c" data-selected-links="repo_source repo_downloads repo_commits repo_releases repo_tags repo_branches /datland/splash" itemprop="url"> + <svg aria-hidden="true" class="octicon octicon-code" height="16" version="1.1" viewBox="0 0 14 16" width="14"><path d="M9.5 3L8 4.5 11.5 8 8 11.5 9.5 13 14 8 9.5 3zm-5 0L0 8l4.5 5L6 11.5 2.5 8 6 4.5 4.5 3z"></path></svg> + <span itemprop="name">Code</span> + <meta itemprop="position" content="1"> +</a> </span> + + <span itemscope itemtype="http://schema.org/ListItem" itemprop="itemListElement"> + <a href="/datland/splash/issues" class="js-selected-navigation-item reponav-item" data-hotkey="g i" data-selected-links="repo_issues repo_labels repo_milestones /datland/splash/issues" itemprop="url"> + <svg aria-hidden="true" class="octicon octicon-issue-opened" height="16" version="1.1" viewBox="0 0 14 16" width="14"><path d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg> + <span itemprop="name">Issues</span> + <span class="counter">1</span> + <meta itemprop="position" content="2"> +</a> </span> + + <span itemscope itemtype="http://schema.org/ListItem" itemprop="itemListElement"> + <a href="/datland/splash/pulls" class="js-selected-navigation-item reponav-item" data-hotkey="g p" data-selected-links="repo_pulls /datland/splash/pulls" itemprop="url"> + <svg aria-hidden="true" class="octicon octicon-git-pull-request" height="16" version="1.1" viewBox="0 0 12 16" width="12"><path d="M11 11.28V5c-.03-.78-.34-1.47-.94-2.06C9.46 2.35 8.78 2.03 8 2H7V0L4 3l3 3V4h1c.27.02.48.11.69.31.21.2.3.42.31.69v6.28A1.993 1.993 0 0 0 10 15a1.993 1.993 0 0 0 1-3.72zm-1 2.92c-.66 0-1.2-.55-1.2-1.2 0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2zM4 3c0-1.11-.89-2-2-2a1.993 1.993 0 0 0-1 3.72v6.56A1.993 1.993 0 0 0 2 15a1.993 1.993 0 0 0 1-3.72V4.72c.59-.34 1-.98 1-1.72zm-.8 10c0 .66-.55 1.2-1.2 1.2-.65 0-1.2-.55-1.2-1.2 0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2zM2 4.2C1.34 4.2.8 3.65.8 3c0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2z"></path></svg> + <span itemprop="name">Pull requests</span> + <span class="counter">0</span> + <meta itemprop="position" content="3"> +</a> </span> + + + + <a href="/datland/splash/pulse" class="js-selected-navigation-item reponav-item" data-selected-links="pulse /datland/splash/pulse"> + <svg aria-hidden="true" class="octicon octicon-pulse" height="16" version="1.1" viewBox="0 0 14 16" width="14"><path d="M11.5 8L8.8 5.4 6.6 8.5 5.5 1.6 2.38 8H0v2h3.6l.9-1.8.9 5.4L9 8.5l1.6 1.5H14V8z"></path></svg> + Pulse +</a> + <a href="/datland/splash/graphs" class="js-selected-navigation-item reponav-item" data-selected-links="repo_graphs repo_contributors /datland/splash/graphs"> + <svg aria-hidden="true" class="octicon octicon-graph" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path d="M16 14v1H0V0h1v14h15zM5 13H3V8h2v5zm4 0H7V3h2v10zm4 0h-2V6h2v7z"></path></svg> + Graphs +</a> + +</nav> + + </div> +</div> + +<div class="container new-discussion-timeline experiment-repo-nav"> + <div class="repository-content"> + + + +<a href="/datland/splash/blob/165fb4005b741c730a0dd1dcdd4ab6be507dbe8e/PublicBitsKnightFoundationRound2.pdf" class="hidden js-permalink-shortcut" data-hotkey="y">Permalink</a> + +<!-- blob contrib key: blob_contributors:v21:0ed6e83875693588e740bc562858ee87 --> + +<div class="file-navigation js-zeroclipboard-container"> + +<div class="select-menu branch-select-menu js-menu-container js-select-menu left"> + <button class="btn btn-sm select-menu-button js-menu-target css-truncate" data-hotkey="w" + title="master" + type="button" aria-label="Switch branches or tags" tabindex="0" aria-haspopup="true"> + <i>Branch:</i> + <span class="js-select-button css-truncate-target">master</span> + </button> + + <div class="select-menu-modal-holder js-menu-content js-navigation-container" data-pjax aria-hidden="true"> + + <div class="select-menu-modal"> + <div class="select-menu-header"> + <svg aria-label="Close" class="octicon octicon-x js-menu-close" height="16" role="img" version="1.1" viewBox="0 0 12 16" width="12"><path d="M7.48 8l3.75 3.75-1.48 1.48L6 9.48l-3.75 3.75-1.48-1.48L4.52 8 .77 4.25l1.48-1.48L6 6.52l3.75-3.75 1.48 1.48z"></path></svg> + <span class="select-menu-title">Switch branches/tags</span> + </div> + + <div class="select-menu-filters"> + <div class="select-menu-text-filter"> + <input type="text" aria-label="Filter branches/tags" id="context-commitish-filter-field" class="form-control js-filterable-field js-navigation-enable" placeholder="Filter branches/tags"> + </div> + <div class="select-menu-tabs"> + <ul> + <li class="select-menu-tab"> + <a href="#" data-tab-filter="branches" data-filter-placeholder="Filter branches/tags" class="js-select-menu-tab" role="tab">Branches</a> + </li> + <li class="select-menu-tab"> + <a href="#" data-tab-filter="tags" data-filter-placeholder="Find a tag…" class="js-select-menu-tab" role="tab">Tags</a> + </li> + </ul> + </div> + </div> + + <div class="select-menu-list select-menu-tab-bucket js-select-menu-tab-bucket" data-tab-filter="branches" role="menu"> + + <div data-filterable-for="context-commitish-filter-field" data-filterable-type="substring"> + + + <a class="select-menu-item js-navigation-item js-navigation-open selected" + href="/datland/splash/blob/master/PublicBitsKnightFoundationRound2.pdf" + data-name="master" + data-skip-pjax="true" + rel="nofollow"> + <svg aria-hidden="true" class="octicon octicon-check select-menu-item-icon" height="16" version="1.1" viewBox="0 0 12 16" width="12"><path d="M12 5l-8 8-4-4 1.5-1.5L4 10l6.5-6.5z"></path></svg> + <span class="select-menu-item-text css-truncate-target js-select-menu-filter-text" title="master"> + master + </span> + </a> + </div> + + <div class="select-menu-no-results">Nothing to show</div> + </div> + + <div class="select-menu-list select-menu-tab-bucket js-select-menu-tab-bucket" data-tab-filter="tags"> + <div data-filterable-for="context-commitish-filter-field" data-filterable-type="substring"> + + + </div> + + <div class="select-menu-no-results">Nothing to show</div> + </div> + + </div> + </div> +</div> + + <div class="btn-group right"> + <a href="/datland/splash/find/master" + class="js-pjax-capture-input btn btn-sm" + data-pjax + data-hotkey="t"> + Find file + </a> + <button aria-label="Copy file path to clipboard" class="js-zeroclipboard btn btn-sm zeroclipboard-button tooltipped tooltipped-s" data-copied-hint="Copied!" type="button">Copy path</button> + </div> + <div class="breadcrumb js-zeroclipboard-target"> + <span class="repo-root js-repo-root"><span class="js-path-segment"><a href="/datland/splash"><span>splash</span></a></span></span><span class="separator">/</span><strong class="final-path">PublicBitsKnightFoundationRound2.pdf</strong> + </div> +</div> + + + <div class="commit-tease"> + <span class="right"> + <a class="commit-tease-sha" href="/datland/splash/commit/4412c15048b290152c501d49306e53ca9725163f" data-pjax> + 4412c15 + </a> + <relative-time datetime="2016-03-17T00:16:03Z">Mar 16, 2016</relative-time> + </span> + <div> + <img alt="@karissa" class="avatar" height="20" src="https://avatars2.githubusercontent.com/u/633012?v=3&s=40" width="20" /> + <a href="/karissa" class="user-mention" rel="contributor">karissa</a> + <a href="/datland/splash/commit/4412c15048b290152c501d49306e53ca9725163f" class="message" data-pjax="true" title="add grant proposal">add grant proposal</a> + </div> + + <div class="commit-tease-contributors"> + <button type="button" class="btn-link muted-link contributors-toggle" data-facebox="#blob_contributors_box"> + <strong>1</strong> + contributor + </button> + + </div> + + <div id="blob_contributors_box" style="display:none"> + <h2 class="facebox-header" data-facebox-id="facebox-header">Users who have contributed to this file</h2> + <ul class="facebox-user-list" data-facebox-id="facebox-description"> + <li class="facebox-user-list-item"> + <img alt="@karissa" height="24" src="https://avatars0.githubusercontent.com/u/633012?v=3&s=48" width="24" /> + <a href="/karissa">karissa</a> + </li> + </ul> + </div> + </div> + +<div class="file"> + <div class="file-header"> + <div class="file-actions"> + + <div class="btn-group"> + <a href="/datland/splash/raw/master/PublicBitsKnightFoundationRound2.pdf" class="btn btn-sm " id="raw-url">Raw</a> + <a href="/datland/splash/commits/master/PublicBitsKnightFoundationRound2.pdf" class="btn btn-sm " rel="nofollow">History</a> + </div> + + + <!-- </textarea> --><!-- '"` --><form accept-charset="UTF-8" action="/datland/splash/delete/master/PublicBitsKnightFoundationRound2.pdf" class="inline-form" data-form-nonce="7da070266e40c9bbf2ea8aee71e94eda0385811e" method="post"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="✓" /><input name="authenticity_token" type="hidden" value="R2xCh3dDU3iOff6BAYSHAusVQFS3K9trZbdcUrIviK4fEL9oN56IZXrPYO3v78NWWL/DnoDiZYM5EAzNDMfKyA==" /></div> + <button class="btn-octicon btn-octicon-danger tooltipped tooltipped-nw" type="submit" + aria-label="You must be signed in to make or propose changes" data-disable-with> + <svg aria-hidden="true" class="octicon octicon-trashcan" height="16" version="1.1" viewBox="0 0 12 16" width="12"><path d="M11 2H9c0-.55-.45-1-1-1H5c-.55 0-1 .45-1 1H2c-.55 0-1 .45-1 1v1c0 .55.45 1 1 1v9c0 .55.45 1 1 1h7c.55 0 1-.45 1-1V5c.55 0 1-.45 1-1V3c0-.55-.45-1-1-1zm-1 12H3V5h1v8h1V5h1v8h1V5h1v8h1V5h1v9zm1-10H2V3h9v1z"></path></svg> + </button> +</form> </div> + + <div class="file-info"> + 240 KB + </div> +</div> + + + + <div itemprop="text" class="blob-wrapper data type-text"> + + <div class="render-wrapper"> + <div class="render-container is-render-pending js-render-target " + data-identity="ae1ebd27-8361-41a7-8bc1-d1250933c5d9" + data-host="https://render.githubusercontent.com" + data-type="pdf"> + <img alt="" class="octospinner" height="64" src="https://assets-cdn.github.com/images/spinners/octocat-spinner-128.gif" width="64" /> + <div class="render-viewer-error">Sorry, something went wrong. <a href="https://github.com/datland/splash/blob/master/PublicBitsKnightFoundationRound2.pdf">Reload?</a></div> + <div class="render-viewer-fatal">Sorry, we cannot display this file.</div> + <div class="render-viewer-invalid">Sorry, this file is invalid so it cannot be displayed.</div> + <iframe class="render-viewer" src="https://render.githubusercontent.com/view/pdf?commit=165fb4005b741c730a0dd1dcdd4ab6be507dbe8e&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6461746c616e642f73706c6173682f313635666234303035623734316337333061306464316463646434616236626535303764626538652f5075626c6963426974734b6e69676874466f756e646174696f6e526f756e64322e706466&nwo=datland%2Fsplash&path=PublicBitsKnightFoundationRound2.pdf&repository_id=50209409#ae1ebd27-8361-41a7-8bc1-d1250933c5d9" sandbox="allow-scripts allow-same-origin allow-top-navigation">Viewer requires iframe.</iframe> + </div> + </div> + + </div> + +</div> + +<button type="button" data-facebox="#jump-to-line" data-facebox-class="linejump" data-hotkey="l" class="hidden">Jump to Line</button> +<div id="jump-to-line" style="display:none"> + <!-- </textarea> --><!-- '"` --><form accept-charset="UTF-8" action="" class="js-jump-to-line-form" method="get"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="✓" /></div> + <input class="form-control linejump-input js-jump-to-line-field" type="text" placeholder="Jump to line…" aria-label="Jump to line" autofocus> + <button type="submit" class="btn">Go</button> +</form></div> + + </div> + <div class="modal-backdrop"></div> +</div> + + + </div> + </div> + + </div> + + <div class="container site-footer-container"> + <div class="site-footer" role="contentinfo"> + <ul class="site-footer-links right"> + <li><a href="https://status.github.com/" data-ga-click="Footer, go to status, text:status">Status</a></li> + <li><a href="https://developer.github.com" data-ga-click="Footer, go to api, text:api">API</a></li> + <li><a href="https://training.github.com" data-ga-click="Footer, go to training, text:training">Training</a></li> + <li><a href="https://shop.github.com" data-ga-click="Footer, go to shop, text:shop">Shop</a></li> + <li><a href="https://github.com/blog" data-ga-click="Footer, go to blog, text:blog">Blog</a></li> + <li><a href="https://github.com/about" data-ga-click="Footer, go to about, text:about">About</a></li> + + </ul> + + <a href="https://github.com" aria-label="Homepage" class="site-footer-mark" title="GitHub"> + <svg aria-hidden="true" class="octicon octicon-mark-github" height="24" version="1.1" viewBox="0 0 16 16" width="24"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path></svg> +</a> + <ul class="site-footer-links"> + <li>© 2016 <span title="0.05642s from github-fe120-cp1-prd.iad.github.net">GitHub</span>, Inc.</li> + <li><a href="https://github.com/site/terms" data-ga-click="Footer, go to terms, text:terms">Terms</a></li> + <li><a href="https://github.com/site/privacy" data-ga-click="Footer, go to privacy, text:privacy">Privacy</a></li> + <li><a href="https://github.com/security" data-ga-click="Footer, go to security, text:security">Security</a></li> + <li><a href="https://github.com/contact" data-ga-click="Footer, go to contact, text:contact">Contact</a></li> + <li><a href="https://help.github.com" data-ga-click="Footer, go to help, text:help">Help</a></li> + </ul> + </div> +</div> + + + + + + <div id="ajax-error-message" class="ajax-error-message flash flash-error"> + <svg aria-hidden="true" class="octicon octicon-alert" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path d="M8.865 1.52c-.18-.31-.51-.5-.87-.5s-.69.19-.87.5L.275 13.5c-.18.31-.18.69 0 1 .19.31.52.5.87.5h13.7c.36 0 .69-.19.86-.5.17-.31.18-.69.01-1L8.865 1.52zM8.995 13h-2v-2h2v2zm0-3h-2V6h2v4z"></path></svg> + <button type="button" class="flash-close js-flash-close js-ajax-error-dismiss" aria-label="Dismiss error"> + <svg aria-hidden="true" class="octicon octicon-x" height="16" version="1.1" viewBox="0 0 12 16" width="12"><path d="M7.48 8l3.75 3.75-1.48 1.48L6 9.48l-3.75 3.75-1.48-1.48L4.52 8 .77 4.25l1.48-1.48L6 6.52l3.75-3.75 1.48 1.48z"></path></svg> + </button> + Something went wrong with that request. Please try again. + </div> + + + <script crossorigin="anonymous" src="https://assets-cdn.github.com/assets/compat-7db58f8b7b91111107fac755dd8b178fe7db0f209ced51fc339c446ad3f8da2b.js"></script> + <script crossorigin="anonymous" src="https://assets-cdn.github.com/assets/frameworks-9694cf6e2bb6c831700640aa81f2214a116964d62c735a93522dd3cf7f28d0bd.js"></script> + <script async="async" crossorigin="anonymous" src="https://assets-cdn.github.com/assets/github-f8beb51311ba00b2b498862037f9e0f930d6ef948e94bda47ba40d686756c5c1.js"></script> + + + + + + + <div class="js-stale-session-flash stale-session-flash flash flash-warn flash-banner hidden"> + <svg aria-hidden="true" class="octicon octicon-alert" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path d="M8.865 1.52c-.18-.31-.51-.5-.87-.5s-.69.19-.87.5L.275 13.5c-.18.31-.18.69 0 1 .19.31.52.5.87.5h13.7c.36 0 .69-.19.86-.5.17-.31.18-.69.01-1L8.865 1.52zM8.995 13h-2v-2h2v2zm0-3h-2V6h2v4z"></path></svg> + <span class="signed-in-tab-flash">You signed in with another tab or window. <a href="">Reload</a> to refresh your session.</span> + <span class="signed-out-tab-flash">You signed out in another tab or window. <a href="">Reload</a> to refresh your session.</span> + </div> + <div class="facebox" id="facebox" style="display:none;"> + <div class="facebox-popup"> + <div class="facebox-content" role="dialog" aria-labelledby="facebox-header" aria-describedby="facebox-description"> + </div> + <button type="button" class="facebox-close js-facebox-close" aria-label="Close modal"> + <svg aria-hidden="true" class="octicon octicon-x" height="16" version="1.1" viewBox="0 0 12 16" width="12"><path d="M7.48 8l3.75 3.75-1.48 1.48L6 9.48l-3.75 3.75-1.48-1.48L4.52 8 .77 4.25l1.48-1.48L6 6.52l3.75-3.75 1.48 1.48z"></path></svg> + </button> + </div> +</div> + + </body> +</html> + diff --git a/papers/dat-paper.md b/papers/dat-paper.md new file mode 100644 index 0000000..b4598dd --- /dev/null +++ b/papers/dat-paper.md @@ -0,0 +1,135 @@ +# Abstract + +Dat is a swarm based version control system designed for sharing large datasets over networks such that their contents can be accessed randomly, be updated incrementally, and have the integrity of their contents be trusted. Every Dat user is simultaneously a server and a client exchanging pieces of data with other peers in a swarm on demand. As data is added to a Dat repository updated files are split into pieces based on Rabin fingerprinting and deduplicated against known pieces to avoid retransmission of data. File contents are automatically verified using secure hashes meaning you do not need to trust other nodes. + +# 1. Introduction + +There are countless ways to share datasets over the Internet today. The simplest and most widely used approach, sharing files over HTTP, is subject to dead links when files are moved or deleted, as HTTP has no concept of history or versioning built in. E-mailing datasets as attachments is also widely used, and has the concept of history built in, but many email providers limit the maximum attachment size which makes it impractical for many datasets. + +Cloud storage services like S3 ensure availability of data, but as they have a centralized hub-and-spoke networking model tend to be limited by their bandwidth, meaning popular files can be come very expensive to share. Services like Dropbox and Google Drive provide version control and synchronization on top of cloud storage services which fixes many issues with broken links but rely on proprietary code and infrastructure requiring users to store their data on cloud infrastructure which has implications on cost, transfer speeds, and user privacy. + +Distributed file sharing tools like BitTorrent become faster as files become more popular, removing the bandwidth bottleneck and making file distribution effectively free. They also implement discovery systems which prevents broken links meaning if the original source goes offline other backup sources can be automatically discovered. However P2P file sharing tools today are not supported by Web browsers and do not provide a mechanism for updating files without redistributing a new dataset which could mean entire redownloading data you already have. + +Decentralized version control tools for source code like Git provide a protocol for efficiently downloading changes to a set of files, but are optimized for text files and have issues with large files. Solutions like Git-LFS solve this by using HTTP to download large files, rather than the Git protocol. GitHub offers Git-LFS hosting but charges repository owners for bandwidth on popular files. Building a peer to peer distribution layer for files in a Git repository is difficult due to design of Git Packfiles which are delta compressed repository states that do not support random access to byte ranges in previous file versions. + +Science is an example of an important community that would benefit from better approaches in this area. Increasingly scientific datasets are being provided online using one of the above approaches and cited in published literature. Broken links and systems that do not provide version checking or content addressability of data directly limit the reproducibility of scientific analyses based on shared datasets. Services that charge a premium for bandwidth cause monetary and data transfer strain on the users sharing the data, who are often on fast public university networks with effectively unlimited bandwidth. Version control tools designed for text files do not keep up with the demands of large data analysis in science today. + +# 2. Inspiration + +Dat is inspired by a number of features from existing systems. + +## 2.1 Git + +Git popularized the idea of a Merkle DAG, a way to represent changes to data where each change is addressed by the secure hash of the change plus all previous hashes. This provides a way to trust data integrity, as the only way a specific hash could be derived by another peer is if they have the same data and change history required to reproduce that hash. This is important for reproducibility as it lets you trust that a specific git commit hash refers to a specific source code state. + +## 2.2 LBFS + +LBFS is a networked file system that avoids transferring redundant data by deduplicating common regions of files and only transferring unique regions once. The deduplication algorithm they use is called Rabin fingerprinting and works by hashing the contents of the file using a sliding window and looking for content defined chunk boundaries that probabilistically appear at the desired byte offsets (e.g. every 1kb). + +Content defined chunking has the benefit of being shift resistant, meaning if you insert a byte into the middle of a file only the first chunk boundary to the right of the insert will change, but all other boundaries will remain the same. With a fixed size chunking strategy, such as the one used by rsync, all chunk boundaries to the right of the insert will be shifted by one byte, meaning half of the chunks of the file would need to be retransmitted. + +## 2.3 BitTorrent + +BitTorrent implements a swarm based file sharing protocol for static datasets. Data is split into fixed sized chunks, hashed, and then that hash is used to discover peers that have the same data. An advantage of using BitTorrent for dataset transfers is that download bandwidth can be fully used. Since the file is split into pieces, and peers can efficiently discover which pieces each of the peers they are connected to have, it means one peer can download non-overlapping regions of the dataset from many peers at the same time in parallel, maximizing network throughput. + +Fixed sized chunking has drawbacks for data that changes (see LBFS above). BitTorrent assumes all metadata will be transferred up front which makes it impractical for streaming or updating content. Most BitTorrent clients divide data into 1024 pieces meaning large datasets could have a very large chunk size which impacts random access performance (e.g. for streaming video). + +## 2.4 Kademlia Distributed Hash Table + +Kademlia is a distributed hash table, in other words a distributed key/value store that can serve a similar purpose to DNS servers but has no hard coded server addresses. All clients in Kademlia are also servers. As long as you know at least one address of another peer in the network, you can ask them for the key you are trying to find and they will either have it or give you some other people to talk to that are more likely to have it. + +If you don't have an initial peer to talk to you have to use something like a bootstrap server that just randomly gives you a peer in the network to start with. If the bootstrap server goes down, the network still functions, and other methods can be used to bootstrap new peers (such as sending them peer addresses through side channels like how .torrent files include tracker addresses to try in case Kademlia finds no peers). + +Kademlia is distinct from previous DHT designs such as Chord due to its simplicity. It uses a very simple XOR operation between two keys as its distance metric to decide which peers are closer to the data being searched for. On paper it seems like it wouldn't work as it doesn't take into account things like ping speed or bandwidth. Instead its design is very simple on purpose to minimize the amount of control/gossip messages and to minimize the amount of complexity required to implement it. In practice Kademlia has been extremely successful and is widely deployed as the "Mainline DHT" for BitTorrent, with support in all popular BitTorrent clients today. + +## 2.5 Peer to Peer Streaming Peer Protocol (PPSPP) + +PPSPP ([IETF RFC 7574](https://datatracker.ietf.org/doc/rfc7574/?include_text=1)) is a protocol for live streaming content over a peer to peer network. In it they define a specific type of Merkle Tree that allows for subsets of the hashes to be requested by a peer in order to reduce the time-till-playback for end users. BitTorrent for example transfers all hashes up front, which is not suitable for live streaming. + +Their Merkle trees are ordered using a scheme they call "bin numbering", which is a method for deterministically arranging an append-only log of leaf nodes into an in-order layout tree where non-leaf nodes are derived hashes. If you want to verify a specific node, you only need to request its sibling's hash and all its uncle hashes. PPSPP is very concerned with reducing round trip time and time-till-playback by allowing for many kinds of optimizations, such as to pack as many hashes into datagrams as possible when exchanging tree information with peers. + +Although PPSPP was designed with streaming video in mind, the ability to request a subset of metadata from a large and/or streaming dataset is very desirable for many other types of datasets. + +## 2.6 WebTorrent + +With WebRTC browsers can now make peer to peer connections directly to other browsers. BitTorrent uses UDP sockets which aren't available to browser JavaScript, so can't be used as-is on the Web. + +WebTorrent implements the BitTorrent protocol in JavaScript using WebRTC as the transport. This includes the BitTorrent block exchange protocol as well as the tracker protocol implemented in a way that can enable hybrid nodes, talking simultaneously to both BitTorrent and WebTorrent swarms (if a client is capable of making both UDP sockets as well as WebRTC sockets, such as Node.js). Trackers are exposed to web clients over HTTP or WebSockets. + +## 2.7 InterPlanetary File System + +IPFS also builds on many of the concepts from this section and presents a new platform similar in scope to the Web that has content integrity, peer to peer file sharing, version history and data permanence baked in as a sort of upgrade to the current Web. Whereas Dat is one application of these ideas that is specifically focused on sharing datasets but is agnostic to what platform it is built on, IPFS goes lower level and abstracts network protocols and naming systems so that any application built on the Web can alternatively be built on IPFS to inherit it's properties, as long as their hyperlinks can be expressed as content addressed addresses to the IPFS global Merkle DAG. + +The research behind IPFS has coalesced many of these ideas into a more accessible format. We are still exploring how to best implement the Dat protocol on top of the IPFS platform. + +# 3. DESIGN + +Dat is a file sharing protocol that does not assume a dataset is static or that the entire dataset will be downloaded. The protocol is agnostic to the underlying transport e.g. you could implement Dat over carrier pigeon. The key properties of the Dat design are explained in this section. + +- 1. **Mirroring** - All participants in the network simultaneously share and consume data. +- 2. **Content Integrity** - Data and publisher integrity is verified through use of signed hashes of the content. +- 3. **Parallel Transfer** - Subsets of the data can be accessed from multiple peers simultaneously, improving transfer speeds. +- 4. **Streaming Updates** - Datasets can be updated and distributed in real time to downstream peers. +- 5. **Secure Metadata** - Dat employs a capability system whereby anyone with a Dat link can connect to the swarm, but the link itself is a secure hash that is nearly impossible to guess and is never leaked by Dat itself. + +## 3.1 Mirroring + +Dat is a peer to peer protocol designed to exchange pieces of a dataset amongst a swarm of peers. As soon as a peer acquires their first piece of data in the dataset they become a partial mirror for the dataset. If someone else contacts them and needs the piece they have, they can share it. This can happen simultaneously while the peer is still downloading the pieces they want. + +### 3.1.1 Source Discovery + +An important aspect of mirroring is source discovery, the techniques that peers use to find each other. Source discovery means finding the IP and port of data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data using the Dat file exchange protocol, Hypercore. By using source discovery techniques we are able to create a network where data can be discovered even if the original data source disappears. + +Source discovery can happen over many kinds of networks, as long as you can model the following actions: + +- `join(key, [port])` - Begin performing regular lookups on an interval for `key`. Specify `port` if you want to announce that you share `key` as well. +- `leave(key, [port])` - Stop looking for `key`. Specify `port` to stop announcing that you share `key` as well. +- `foundpeer(key, ip, port)` - Called when a peer is found by a lookup + +In the Dat implementation we implement the above actions on top of three types of discovery networks: + +- DNS name servers - An Internet standard mechanism for resolving keys to addresses +- Multicast DNS - Useful for discovering peers on local networks +- Kademlia Mainline Distributed Hash Table - Zero point of failure, increases probability of Dat working even if DNS servers are unreachable + +Additional discovery networks can be implemented as needed. We chose the above three as a starting point to have a complementary mix of strategies to increase the probability of source discovery. + +Our implementation of peer discovery is called discovery-channel. We also run a [custom DNS server](https://www.npmjs.com/package/dns-discovery) that Dat clients use (in addition to specifying their own if they need to), as well as a [DHT bootstrap](https://github.com/bittorrent/bootstrap-dht) server. These discovery servers are the only centralized infrastructure we need for Dat to work over the Internet, but they are redundant, interchangeable, never see the actual data being shared, anyone can run their own and Dat will still work even if they all are unavailable. If this happens discovery will just be manual (e.g. manually sharing IP/ports). Every data source that has a copy of the data also advertises themselves across these discovery networks. + +### 3.1.2 Peer Connections + +Up until this point we have just done searches to find who has the data we need. Now that we know who should talk to, we have to connect to them. Once we have a duplex binary connection to a peer we then layer on our own file sharing protocol on top, called [Hypercore](https://github.com/mafintosh/hypercore). + +In our implementation, we use either [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol), [UTP](https://en.wikipedia.org/wiki/Micro_Transport_Protocol) or WebRTC sockets for the actual peer to peer connections. UTP is nice because it is designed to *not* take up all available bandwidth on a network (e.g. so that other people sharing your wifi can still use the Internet). WebRTC support makes Dat work in modern web browsers using peer to peer connections. + +When we get the IP and port for a potential source we try to connect using all available protocols and hope one works. If one connects first, we abort the other ones. If none connect, we try again until we decide that source is offline or unavailable to use and we stop trying to connect to them. Sources we are able to connect to go into a list of known good sources, so that if our Internet connection goes down we can use that list to reconnect to our good sources again quickly. + +If we get a lot of potential sources we pick a handful at random to try and connect to and keep the rest around as additional sources to use later in case we decide we need more sources. A lot of these are parameters that we can tune for different scenarios later, but have started with some best guesses as defaults. + +The connection logic is implemented in a module called [discovery-swarm](https://www.npmjs.com/package/discovery-swarm). This builds on discovery-channel and adds connection establishment, management and statistics. You can see stats like how many sources are currently connected, how many good and bad behaving sources you've talked to, and it automatically handles connecting and reconnecting to sources for you. Our UTP support is implemented in the module [utp-native](https://www.npmjs.com/package/utp-native). + +So now we have found data sources, connected to them, but we haven't yet figured out if they *actually* have the data we need. This is where our file transfer protocol [Hyperdrive](https://www.npmjs.com/package/hyperdrive) comes in. This is explained in a later section. + +Peer connections types are outside the scope of the Dat protocol, but in the Dat implementation we make a best effort to make as many successful connections using our default types as possible. This means employing peer to peer connection techniques like UDP hole punching [?]. Our approach for UDP hole punching is to use a central known hole punching server which is accessible on the public Internet. In our implementation we re-use our custom DNS server by adding to it special functionality to facilitate peer message exchange for the purpose of hole punching. + +In a scenario where two peers A and B want to connect, and both know the central server, this is how we perform UDP hole punching: + +1. Peer A creates a local UDP socket and messages the central server that it is interested in connecting to people. +2. Central server messages Peer A back with a token that is a `hash(Peer A's remote IP + a local secret)`. The UDP packet contains the remote IP. +3. Peer A messages the central server with the token (this way you cannot spoof your IP and DDOS a remote peer) +4. Peer B does the same. +5. When the central server receives Peer B's message that it wants to connect to peers it forwards Peer B's message to Peer A and Peer A's message to Peer B. +6. Both peers now send a message to each other on their public IP and port. If UDP hole punching is supported by the routers of both peers at least one of the messages should get through. +7. At this point we reuse the UDP socket to run UTP on top to get a streaming reliable interface. + +## 3.2 Content Integrity + +Content integrity means being able to verify the data you received is the exact same version of the data that you expected. This is imporant in a distributed system as this mechanism will catch incorrect data sent by bad peers. It also has implications for reproducibility as it lets you refer to a specific version of the dataset you want. + +A common issue in data analysis is when data changes but the link to the data remains the same. For example, one day a file called data.zip might change, but a simple HTTP link to the file does not include a hash of the content, so clients that only have the HTTP link have no way to check if the file changed. Looking up a file by the hash of its content is called content addressability, and lets users not only verify that the data they receive is the version of the data they want, but also lets people cite specific versions of the data by referring to a specific hash. + +## 3.3 Parallel Transfer + +## 3.4 Streaming Updates + +## 3.5 Secure Metadata diff --git a/papers/dat-paper.pdf b/papers/dat-paper.pdf Binary files differnew file mode 100644 index 0000000..e6ef10c --- /dev/null +++ b/papers/dat-paper.pdf |