From d1399d3ea9d3473d9986d1e90aef59de1025c687 Mon Sep 17 00:00:00 2001 From: Joe Hand Date: Wed, 31 May 2017 10:55:05 -0700 Subject: SLEEP + general docs update + more introduction content (#57) * big docs update * fix sleep link * separate sustainability header * better into and reword * clean up terms and other places * add intro, overview, troubleshooting. update more content * minor change to force new deploy * more words * more updates and some pictures * wordsmith intro to be more about dat uniqueness * syncing & publishing * add link files description * add new gifs * change overview to remove tutorial * add wip tutorial content * update build with dat * add http file --- docs/contents.json | 28 ++++++---- docs/cookbook/diy-dat.md | 120 +++++++++++++++++++-------------------- docs/cookbook/http.md | 61 ++++++++++++++++++++ docs/cookbook/tutorial.md | 50 +++++++++++++++++ docs/cookbook/using-fs.md | 58 +++++++++++++++++++ docs/dat-powered.md | 13 +++++ docs/ecosystem.md | 47 ++++++++++------ docs/faq.md | 140 ++++++++++++++++++++++++++++++++++++---------- docs/how-dat-works.md | 4 +- docs/install.md | 18 ------ docs/intro.md | 47 ++++++++++++++++ docs/overview.md | 91 ++++++++++++++++++++++++++++++ docs/terms.md | 90 ++++++++++++++--------------- docs/troubleshooting.md | 92 ++++++++++++++++++++++++++++++ 14 files changed, 673 insertions(+), 186 deletions(-) create mode 100644 docs/cookbook/http.md create mode 100644 docs/cookbook/tutorial.md create mode 100644 docs/cookbook/using-fs.md create mode 100644 docs/dat-powered.md delete mode 100644 docs/install.md create mode 100644 docs/intro.md create mode 100644 docs/overview.md create mode 100644 docs/troubleshooting.md (limited to 'docs') diff --git a/docs/contents.json b/docs/contents.json index 29fdaa6..f87b1f1 100644 --- a/docs/contents.json +++ b/docs/contents.json @@ -1,19 +1,25 @@ { - "Dat": { - "Introduction": "modules/dat.md", - "How Dat Works": "how-dat-works.md", - "Terminology": "terms.md", - "FAQ": "faq.md" + "Using Dat": { + "Introduction": "intro.md", + "Dat Concepts": "overview.md", + "Command Line": "modules/dat.md", + "FAQ": "faq.md", + "Troubleshooting": "troubleshooting.md", + "Terminology": "terms.md" }, "Cookbook": { - "On a Server": "cookbook/server.md", + "Getting Started": "cookbook/tutorial.md", + "Sharing files over HTTP": "cookbook/http.md", + "Running a Dat Server": "cookbook/server.md", "In the Browser": "cookbook/browser.md", - "Under the Hood": "cookbook/diy-dat.md" + "Using Dat in JS Apps": "cookbook/diy-dat.md", + "Using Hyperdrive FS": "cookbook/using-fs.md" }, - "Ecosystem": { + "Dat Technology": { "Overview": "ecosystem.md", - "SLEEP": "sleep.md", - "Hyperdrive": "modules/hyperdrive.md", - "Hypercore": "modules/hypercore.md" + "dat-node": "modules/dat-node.md", + "hyperdrive": "modules/hyperdrive.md", + "hypercore": "modules/hypercore.md", + "hyperdiscovery": "modules/hyperdiscovery.md" } } diff --git a/docs/cookbook/diy-dat.md b/docs/cookbook/diy-dat.md index 4af610d..c7544a2 100644 --- a/docs/cookbook/diy-dat.md +++ b/docs/cookbook/diy-dat.md @@ -2,15 +2,25 @@ In this guide, we will show how to develop applications with the Dat ecosystem. The Dat ecosystem is very modular making it easy to develop custom applications using Dat. +Dat comes with a built in javascript API we use in Dat Desktop and dat command line. For custom applications, or more control, you can also use the core Dat modules separately. + +Use Dat in your JS Application: + +1. `require('dat')`: use the [high-level Dat JS API](https://github.com/datproject/dat-node). +2. Build your own! + +This tutorial will cover the second option and get you familiar with the core Dat modules. + +### The Dat Core Modules + This tutorial will follow the steps for sharing and downloading files using Dat. In practice, we implement these in [dat-node](https://github.com/datproject/dat-node), a high-level module for using Dat that provides easy access to the core Dat modules. -For any Dat application, there are three essential modules you will start with: +For any Dat application, there are two essential modules you will start with: 1. [hyperdrive](https://npmjs.org/hyperdrive) for file synchronization and versioning -2. [hyperdrive-archive-swarm](https://npmjs.org/hyperdrive-archive-swarm) helps discover and connect to peers over local networks and the internet -3. A [LevelDB](https://npmjs.org/level) compatible database for storing metadata. +2. [hyperdiscovery](https://npmjs.org/hyperdiscovery) helps discover and connect to peers over local networks and the internet -The [Dat CLI](https://npmjs.org/dat) module itself combines these modules and wraps them in a command-line API. These modules can be swapped out for a similarly compatible module, such as switching LevelDb for [MemDB](https://github.com/juliangruber/memdb) (which we do in the first example). More details on how these module work together are available in [How Dat Works](how-dat-works.md). +The [Dat CLI](https://npmjs.org/dat) module itself combines these modules and wraps them in a command-line API. We also use the [dat-storage](https://github.com/datproject/dat-storage) module to handle file and key storage. These modules can be swapped out for a similarly compatible module, such as switching storage for [random-access-memory](https://github.com/mafintosh/random-access-memory). ## Getting Started @@ -18,116 +28,98 @@ You will need node and npm installed to build with Dat. [Read more](https://gith ## Download a File -Our first module will download files from a Dat link entered by the user. View the code for this module on [Github](https://github.com/joehand/diy-dat-examples/tree/master/module-1). +Our first module will download files from a Dat link entered by the user. ```bash mkdir module-1 && cd module-1 npm init -npm install --save hyperdrive memdb hyperdrive-archive-swarm +npm install --save hyperdrive random-access-memory hyperdiscovery touch index.js ``` -For this example, we will use [memdb](https://github.com/juliangruber/memdb) for our database (keeping the metadata in memory rather than on the file system). In your `index.js` file, require the main modules and set them up: +For this example, we will use random-access-memory for our database (keeping the metadata in memory rather than on the file system). In your `index.js` file, require the main modules and set them up: ```js -var memdb = require('memdb') -var Hyperdrive = require('hyperdrive') -var Swarm = require('hyperdrive-archive-swarm') +var ram = require('random-access-memory') +var hyperdrive = require('hyperdrive') +var discovery = require('hyperdiscovery') var link = process.argv[2] // user inputs the dat link -var db = memdb() -var drive = Hyperdrive(db) -var archive = drive.createArchive(link) -var swarm = Swarm(archive) +var archive = hyperdrive(ram, link) +archive.ready(function () { + discovery(archive) +}) ``` -Notice, the user will input the link for the second argument The easiest way to get a file from a hyperdrive archive is to make a read stream. `archive.createFileReadStream` accepts the index number of filename for the first argument. To display the file, we can create a file stream and pipe it to `process.stdout`. +Notice, the user will input the link for the second argument The easiest way to get a file from a hyperdrive archive is to make a read stream. `archive.readFile` accepts the index number of filename for the first argument. To display the file, we can create a file stream and pipe it to `process.stdout`. ```js -var stream = archive.createFileReadStream(0) // get the first file -stream.pipe(process.stdout) +// Make sure your archive has a dat.json file! +var stream = archive.readFile('dat.json', 'utf-8', function (err, data) { + if (err) throw err + console.log(data) +}) ``` -Now, you can run the module! To download the first file from our docs Dat, run: +Now, you can run the module! To download the `dat.json` file from an archive: ``` -node index.js 395e3467bb5b2fa083ee8a4a17a706c5574b740b5e1be6efd65754d4ab7328c2 +node index.js dat:// ``` -You should see the first file in our docs repo. +You should see the `dat.json` file. #### Bonus: Display any file in the Dat With a few more lines of code, the user can enter a file to display from the Dat link. -Challenge: create a module that will allow the user to input a Dat link and a filename: `node bonus.js `. The module will print out that file from the link, as we did above. To get a specific file you can change the file stream to use the filename instead of the index number: +Challenge: create a module that will allow the user to input a Dat link and a filename: `node bonus.js `. The module will print out that file from the link, as we did above: ```js -var stream = archive.createFileReadStream(fileName) +var stream = archive.readFile(fileName) ``` Once you are finished, see if you can view this file by running: ```bash -node bonus.js 395e3467bb5b2fa083ee8a4a17a706c5574b740b5e1be6efd65754d4ab7328c2 cookbook/diy-dat.md +node bonus.js 395e3467bb5b2fa083ee8a4a17a706c5574b740b5e1be6efd65754d4ab7328c2 readme.md ``` -[See how we coded it](https://github.com/joehand/diy-dat-examples/blob/master/module-1/bonus.js). - ## Download all files to computer -This module will build on the last module. Instead of displaying a single file, we will download all of the files from a Dat into a local directory. View the code for this module on [Github](https://github.com/joehand/diy-dat-examples/tree/master/module-2). +This module will build on the last module. Instead of displaying a single file, we will download all of the files from a Dat into a local directory. -To download the files to the file system, instead of to a database, we will use the `file` option in `hyperdrive` and the [random-access-file](http://npmjs.org/random-access-file) module. We will also learn two new archive functions that make handling all the files a bit easier than the file stream in module #1. +To download the files to the file system, we are going to use [mirror-folder](https://github.com/mafintosh/mirror-folder). [Read more](/using-fs) about how mirror-folder works with hyperdrive. -Setup will be the same as before (make sure you install random-access-file and stream-each this time): +In practice, you should use [dat-storage](https://github.com/datproject/dat-storage) to do this as it'll be more efficient and keep the metadata on disk. -```bash -mkdir module-2 && cd module-2 -npm init -npm install --save hyperdrive memdb hyperdrive-archive-swarm random-access-file stream-each -touch index.js -``` - -The first part of the module will look the same. We will add random-access-file (and [stream-each](http://npmjs.org/stream-each) to make things easier). The only difference is that we have to specify the `file` option when creating our archive: +Setup will be the same as before (make sure you install `mirror-folder`). The first part of the module will look the same. ```js -var memdb = require('memdb') -var Hyperdrive = require('hyperdrive') -var Swarm = require('hyperdrive-archive-swarm') -var raf = require('random-access-file') // this is new! -var each = require('stream-each') - -var link = process.argv[2] - -var db = memdb() -var drive = Hyperdrive(db) -var archive = drive.createArchive(link, { - file: function (name) { - return raf(path.join('download', name)) // download into a "download" dir - } -}) -var swarm = Swarm(archive) -``` +var ram = require('random-access-memory') +var hyperdrive = require('hyperdrive') +var discovery = require('hyperdiscovery') +var mirror = require('mirror-folder') -Now that we are setup, we can work with the archive. The `archive.download` function downloads the file content (to wherever you specified in the file option). To download all the files, we will need a list of files and then we will call download on each of them. `archive.list` will give us the list of the files. We use the stream-each module to make it easy to iterate over each item in the archive, then exit when the stream is finished. +var link = process.argv[2] // user inputs the dat link +var dir = process.cwd() // download to cwd -```js -var stream = archive.list({live: false}) // Use {live: false} for now to make the stream easier to handle. -each(stream, function (entry, next) { - archive.download(entry, function (err) { - if (err) return console.error(err) - console.log('downloaded', entry.name) - next() +var archive = hyperdrive(ram, link) +archive.ready(function () { + discovery(archive) + + var progress = mirror({name: '/', fs: archive}, dir, function (err) { + console.log('done downloading!') + }) + progress.on('put', function (src) { + console.log(src.name, 'downloaded') }) -}, function () { - process.exit(0) }) ``` You should be able to run the module and see all our docs files in the `download` folder: ```bash -node index.js 395e3467bb5b2fa083ee8a4a17a706c5574b740b5e1be6efd65754d4ab7328c2 +node index.js dat:// ``` diff --git a/docs/cookbook/http.md b/docs/cookbook/http.md new file mode 100644 index 0000000..7fa451d --- /dev/null +++ b/docs/cookbook/http.md @@ -0,0 +1,61 @@ +# Sharing files over HTTP + +The Dat command line comes with a built in HTTP server. This is a cool demo because we can also see how version history works! The `--http` option works for files you are sharing *or* downloading. + +(If you don't have dat command line installed, run `npm install -g dat`, or [see more info](intro#installation).) + +## Serve over HTTP + +Serve dat files on http by adding the `--http` option. For example, you can sync an existing dat: + +``` +❯ dat sync --http +dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639 +Sharing dat: 2 files (1.4 MB) +Serving files over http at http://localhost:8080 + +2 connections | Download 0 B/s Upload 0 B/s +``` + +Now visit [http://localhost:8080]() to see the files in your browser! The default http port is 8080. You should see a directory listing: + +Dat HTTP viewer + +If your dat has an `index.html` page, that will be shown instead. + +You can combine Dat's http support with our server tools to create a live updating website or place to share files publicly. + +## Built-in Versioning + +As you may know, Dat automatically versions all files. The HTTP display is an easy way to view version history: + +**Use [localhost:8080/?version=2]() to view a specific version.** + +## Live reloading + +The Dat http viewer also comes with live reloading. If it detects a new version it will automatically reload with the new directory listing or page (as long as you aren't viewing a specific version in the url). + +## Sparse Downloading + +Dat supports *sparse*, or partial downloads, of datasets. This is really useful if you only want a single file from a large dat. Unfortunately, we haven't quite built a user interface for this into our applications. So you can hack around it! + +This will allow you to download a single file from a larger dat, without downloading metadata or any other files. + +First, start downloading our demo dat, make sure you include both the flags (`--http`, `--sparse`). + +``` +❯ dat dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639 ./demo --http --sparse +Cloning: 2 files (1.4 MB) +Serving files over http at http://localhost:8080 + +3 connections | Download 0 B/s Upload 0 B/s +``` + +The `--sparse` option tells Dat to only download files you specifically request. See how it works: + +1. Check out your `./demo` folder, it should be empty. +2. [Open the Dat](http://localhost:8080) in your browser. +3. Click on a file to download. +4. It should be in your folder now! + +Pretty cool! You can use this hack to download only specific files or even older versions of files (if they've been saved somewhere). diff --git a/docs/cookbook/tutorial.md b/docs/cookbook/tutorial.md new file mode 100644 index 0000000..58f5288 --- /dev/null +++ b/docs/cookbook/tutorial.md @@ -0,0 +1,50 @@ +# Getting Started with Dat + +In this tutorial we will go through the two main ways to use Dat, sharing data and downloading data. If possible, this is great to go through with a partner to see how Dat works across computers. Get Dat [installed](intro#installation) and get started! + +Dat Desktop makes it easy for anyone to get started using Dat with user-friendly interface. If you are comfortable with the command line then you can install dat via npm. You can always switch apps later and keep your dats the same. Dat can share your files to anyone, it does not matter how they are using Dat. + +## Command Line Tutorial + +### Downloading Data + +We made a demo folder we made just for this exercise. Inside the demo folder is a `dat.json` file and a gif. We shared these files via Dat and now you can download them with our dat key! + +Similar to git, you do download somebody's dat by running `dat clone `. You can also specify the directory: + +``` +❯ dat clone dat://778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639 ~/Downloads/dat-demo +dat v13.5.0 +Created new dat in /Users/joe/Downloads/dat-demo/.dat +Cloning: 2 files (1.4 MB) + +2 connections | Download 614 KB/s Upload 0 B/s + +dat sync complete. +Version 4 +``` + +This will download our demo files to the `~/downloads/dat-demo` folder. These files are being shared by a server over Dat (to ensure high availability) but you may connect to any number of users also hosting the content. + +You can also also view the files online: [datproject.org/778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639](https://datproject.org/778f8d955175c92e4ced5e4f5563f69bfec0c86cc6f670352c457943666fe639/). datproject.org can download files over Dat and display them on http as long as someone is hosting it. The website temporarily caches data for any visited links (do not view your dat on datproject.org if you do not want us caching your data). + +### Sharing Data + +We'll be creating a dat from a folder on your computer. If you are with a friend you can sync these files to their computer. Otherwise you can view them online via datproject.org to see how viewing a dat online works. + +Find a folder on your computer to share. Any kind of files work with Dat but for now, make sure it's something you want to share with your friends. Dat can handle all sorts of files (Dat works with really big folders too!). We like cat pictures. + +First, you can create a new dat inside that folder. Using the `dat create` command also walks us through making a `dat.json` file: + +``` +❯ dat create +Welcome to dat program! +You can turn any folder on your computer into a Dat. +A Dat is a folder with some magic. +``` + +This will create a new (empty) dat. Dat will print a link, share this link to give others access to view your files. + +Once we have our dat, run `dat share` to scan your files and sync them to the network. Share the link with your friend to instantly start downloading files. + +You can also try viewing your files online. Go to [datproject.org](https://datproject.org/explore) and enter your link to preview on the top right. *(Some users, including me when writing this, may have trouble connecting to datproject.org initially. Don't be alarmed! It is something we are working on. Thanks.)* diff --git a/docs/cookbook/using-fs.md b/docs/cookbook/using-fs.md new file mode 100644 index 0000000..39e2816 --- /dev/null +++ b/docs/cookbook/using-fs.md @@ -0,0 +1,58 @@ +# Using the Hyperdrive FS + +[Hyperdrive](https://github.com/mafintosh/hyperdrive), the core file system module in dat, exposes an API that mimics the Node fs API. This allows you to create modules that act on hyperdrive or a regular fs with the same API. We have several modules that we make use of this custom fs, such as [mirror-folder](https://github.com/mafintosh/mirror-folder). + +Mirror folder can copy a regular directory to another regular directory: + +```js +var mirror = require('mirror-folder') + +mirror('/source', '/dest', function (err) { + console.log('mirror complete') +}) +``` + +You can also copy a folder to an archive (this is how we do importing in Dat): + +```js +var archive = hyperdrive('/dir') +mirror('/source', {name: '/', fs: archive}, function (err) { + console.log('mirror complete') +}) +``` + +### Creating Custom FS Modules + +To create a module that uses a custom fs, you can default to the regular `fs` but also accept `fs` as an argument. For example, to print a file you could write this function: + +```js +function printFile(file, fs) { + if (!fs) fs = require('fs') + + fs.readFile(file, 'utf-8', function (err, data) { + console.log(data) + }) +} +``` + +Then you could use this to print a file from a regular fs: + +```js +printFile('/data/hello-world.txt') +``` + +Or from a hyperdrive archive: + +```js +var archive = hyperdrive('/data') +printFile('/hello-world.txt', archive) // pass archive as the fs! +``` + +## Modules! + +See more examples of custom-fs modules: + +* [mirror-folder](https://github.com/mafintosh/mirror-folder) - copy files from one fs to another fs (regular or custom) +* [count-files](https://github.com/joehand/count-files) - count files in regular or custom fs. +* [ftpfs](https://github.com/maxogden/ftpfs) - custom fs for FTPs +* [bagit-fs](https://github.com/joehand/bagit-fs) - custom fs module for the BagIt spec. diff --git a/docs/dat-powered.md b/docs/dat-powered.md new file mode 100644 index 0000000..b30b3ea --- /dev/null +++ b/docs/dat-powered.md @@ -0,0 +1,13 @@ +# Dat Powered Apps + +## Beaker Browser + +## Science Fair + +## Data Archiving + +* Project Svalbard + +## Distributed Tools + +* Hyperpipe diff --git a/docs/ecosystem.md b/docs/ecosystem.md index 97b1d74..4dde55b 100644 --- a/docs/ecosystem.md +++ b/docs/ecosystem.md @@ -1,27 +1,38 @@ -# Dat Module Ecosystem +# Open Source Tools We have built and contributed to a variety of modules that support our work on Dat as well as the larger data and code ecosystem. Feel free to go deeper and see the implementations we are using in the [Dat command-line tool](https://github.com/datproject/dat) and the [Dat Desktop](https://github.com/datproject/dat-desktop). -Dat embraces the Unix philosophy: a modular design with composable parts. All of the pieces can be replaced with alternative implementations as long as they implement the abstract API. +Dat embraces the Unix philosophy: a modular design with composable parts. All of the pieces can be replaced with alternative implementations as long as they implement the abstract API. We believe this creates better end-user software, but more importantly, will create more sustainable and impactful open source tools. -## Public Interface Modules +## User Software -* [dat](https://github.com/datproject/dat) - the command line interface for sharing and downloading files -* [Dat Desktop](https://github.com/datproject/dat-desktop) - dat desktop application for sharing and downloading files -* [datproject.org](https://github.com/datproject/datproject.org/) - repository for the Dat project website, a public data registry and sharing -* [Dat Protocol](https://www.datprotocol.com/) - The Dat protocol specification +* [dat](https://github.com/datproject/dat) - The command line interface for Dat. +* [Dat Desktop](https://github.com/datproject/dat-desktop) - Desktop application for Dat. +* [datproject.org](https://github.com/datproject/datproject.org/) - Repository for the Dat project website, a public data registry and sharing. -## File and Block Component Modules +## Specifications -* [hyperdrive](hyperdrive) - The file sharing network dat uses to distribute files and data. Read the technical [hyperdrive-specification](hyperdrive-specification) about how hyperdrive works. -* [hypercore](hypercore) - exchange low-level binary blocks with many sources -* [rabin](https://www.npmjs.com/package/rabin) - Rabin fingerprinter stream -* [merkle-tree-stream](https://www.npmjs.com/package/merkle-tree-stream) - Used to construct Merkle trees from chunks +* [Dat Whitepaper](https://github.com/datproject/docs/tree/master/papers) - The Dat whitepaper +* [Dat Protocol](https://www.datprotocol.com/) - Site for the Dat protocol -## Networking & Peer Discovery Modules +## Core Modules -* [discovery-channel](https://www.npmjs.com/package/discovery-channel) - discover data sources -* [discovery-swarm](https://www.npmjs.com/package/discovery-swarm) - discover and connect to sources -* [bittorrent-dht](https://www.npmjs.com/package/bittorrent-dht) - use the Kademlia Mainline DHT to discover sources -* [dns-discovery](https://www.npmjs.com/package/dns-discovery) - use DNS name servers and Multicast DNS to discover sources -* [utp-native](https://www.npmjs.com/package/utp-native) - UTP protocol implementation +These modules form the backbone of Dat software: + +* [hypercore](https://github.com/mafintosh/hypercore) - A secure, distributed append-only log. +* [hyperdrive](https://github.com/mafintosh/hyperdrive) - A secure, real time distributed file system (built on hypercore). +* [dat-node](https://github.com/datproject/dat-node) - High-level module for building Dat applications on the file system. +* [hyperdiscovery](https://github.com/karissa/hyperdiscovery) - Defaults for networking discovery and connection management. +* [dat-storage](https://github.com/datproject/dat-storage) - Default storage module for Dat. + +View [more on Github](https://github.com/search?utf8=%E2%9C%93&q=topic%3Adat&type=Repositories). + +## Modules We Like & Use + +These modules we use throughout our applications: + +* [Choo](https://github.com/yoshuawuyts/choo) - A 4kb framework for creating sturdy frontend applications. +* [neat-log](https://github.com/joehand/neat-log) - A neat cli logger for stateful command line applications Edit +Add topics +* [mirror-folder](https://github.com/mafintosh/mirror-folder) - Mirror a folder to another folder, supports hyperdrive and live file watching. +* [toiletdb](https://github.com/maxogden/toiletdb) - CRUD database using a JSON file for storage. diff --git a/docs/faq.md b/docs/faq.md index 4f3c703..d311d87 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -1,10 +1,37 @@ # FAQ -## General +## Organization ### Who is behind the project? -Code for Science and Society (CSS), a US based 501(c)(3) not for profit organization set up to support the Dat project. CSS employs a Dat core development team. Dat is currently funded exclusively by philanthropic non-profit grants. The mission of CSS is to work with public institutions to produce open source infrastructure to improve the ability for researchers, civic hackers and journalists to find and use datasets. However, we actively welcome outside contributors and use cases beyond our own. +Code for Science and Society (CSS), a US based 501(c)(3) not for profit organization set up to support the Dat project. CSS employs a Dat core development team. Dat is currently funded exclusively by philanthropic non-profit grants. The mission of CSS is to work with public institutions to produce open source infrastructure for researchers, civic hackers, and journalists to improve data publishing and long-term access. We actively welcome outside contributors and use cases beyond our mission. + +### Sustainability + +#### What happens if Dat/CSS closes? Will my data be inaccessible? + +Dat software is built with long-term sustainability as a focus. For us, this goes beyond financial sustainability. Establishing lasting data archives depends on a transparent and open process, a wider open source community, and ensuring no single entity or technology is responsible for data storage or access. + +**Public Design Process** All discussion related to the design and development of Dat project software is public (either on IRC or Github). Dat software is released with open source licenses and will always be freely available. + +**Open Source Community** The Dat team develops with [pragmatic modularity](http://mafintosh.com/pragmatic-modularity.html) in mind. We have high-level user facing software, but many of our underlying modules are small, highly focused, and used outside of the Dat project. This helps create a broader community to continue supporting and using the software regardless of the success of Dat itself. + +**No Lock In** We only want you to use Dat because you love it, not because it is too hard to get your data out. Dat does not import or copy your data into specialized databases. This means that you can easily move your data around and keep it intact in its original form. You can even simultaneously host your data on HTTP along with Dat to ensure backwards compatibility with existing web tools. You'll never be locked into using the Dat software. + +**Distributed Storage** Dat is built to distribute storage and bandwidth costs throughout the network. Data hosting and bandwidth are some of the main costs for long-term data storage. By using Dat we can help ensure there are not single points of failure for data storage. Dat does not currently host any data, except for caching on datproject.org. We plan to build hosting options but will prioritize financial sustainability into those services. + +**No Centralized Servers** Dat transfers all data directly between peers and has little reliance on Dat maintaining servers. We have public servers for peers to help discover each other, but those use very little bandwidth and anyone can run them. + +## Dat Usage + +### I am having trouble connecting to another Dat, how do I debug? + +We have some networking debugging tools available in the CLI: + +1. Try running `dat doctor` and following the instructions +2. Try running your command with `DEBUG=discovery* ` in front, e.g. `DEBUG=discovery* dat sync` + +When reading debug output, look for `inbound connection` (means someone else successfully connected to you) or `onconnect` (you successfully connected to someone else). `discovery peer=` messages mean you found a candidate and will try to connect to them. ### How do Dat peers discover one another on the Internet? @@ -22,27 +49,55 @@ It's not technically impossible that they'd collide, but it's extremely unlikely ### What are the limits on file sizes? -Data is transferred directly between peers, we do not store any data right now. The main limits are importing and transfer speeds, which will improve soon. We plan to help institutions and others set up cloud storage for academic uses and commercial options for general users. - -### Does Dat have version history? - -Dat tracks all of the changes to files, but doesn't currently save a backup of those files. To save backups your current data in your dat, you can use [dat-backup](http://npmjs.org/dat-backup) and [archiver-server](http://npmjs.org/archiver-server). We plan to bake this into the CLI tool and desktop app soon. - -### What happens if Dat (the organization/group) disappears? Will all my files get lost or be inaccessible? - -No. Dat doesn't import or copy your data anywhere, just simply scans and stores filesystem metadata while tracking your changes to the data. This means that you can easily move your data around and keep it intact in its original form on the filesystem. You can even simultaneously host your data on HTTP along with Dat to ensure backwards compatibility with existing web tools. - -### How is Dat different than IPFS? - -IPFS and Dat share a number of underlying similarities but address different problems. Both deduplicate content-addressed pieces of data and have a mechanism for searching for peers who have a specific piece of data. Both have implementations which work in modern Web browsers, as well as command line tools. +The Dat software does not have any inherent size limits. The Dat project does not store any data itself except for caching (on datproject.org registry). All data is transferred directly between peers. Depending on where the data is hosted, there may be storage or bandwidth limits. + +To improve the ecosystem and allow for better availability and archiving of data, we plan to help institutions and others set up cloud storage for academic uses. + +### Does Dat store version history? + +Version history is built into our core modules but only some clients support it (more soon!). We have dat tools, intended for servers, such as [hypercore-archiver](https://github.com/mafintosh/hypercore-archiver) and [hypercloud](https://github.com/datprotocol/hypercloud) that store the full content history. + +Once historic content is saved, you can access it in the dat command line. First, you can log the history of the archive to see what version you want: + +```sh +❯ dat log /my-dat +01 [put] / 0 B (0 blocks) +02 [put] /index.html 50 B (1 block) +03 [put] /dat.json 79 B (1 block) +04 [put] /dat.json 82 B (1 block) +05 [put] /dat.json 87 B (1 block) +06 [put] /index.html 51 B (1 block) +07 [put] / 0 B (0 blocks) +08 [put] /delete-test.txt 22 B (1 block) +09 [del] /delete-test.txt +10 [put] / 0 B (0 blocks) +11 [put] /readme.md 5 B (1 block) +12 [del] /readme.md +13 [put] /index.html 55 B (1 block) +14 [put] /index.html 84 B (1 block) +15 [put] /.datrc 42 B (1 block) + +Log synced with network + +Archive has 15 changes (puts: +13, dels: -2) +Current Size: 213 B +Total Size: +- Metadata 715 B +- Content 431 B +Blocks: +- Metadata 16 +- Content 8 +``` -The two systems also have a number of differences. Dat keeps a secure version log of changes to a dataset over time which allows Dat to act as a version control tool. The type of Merkle tree used by Dat lets peers compare which pieces of a specific version of a dataset they each have and efficiently exchange the deltas to complete a full sync. It is not possible to synchronize or version a dataset in this way in IPFS without implementing such functionality yourself, as IPFS provides a CDN and/or filesystem interface but not a synchronization mechanism. In short, IPFS provides distribution of objects, Dat provides synchronization of datasets. +Then you can 'checkout' a specific version using the `--http` interface: -In order for IPFS to provide guarantees about interoperability, IPFS applications must use only the IPFS network stack. In contrast, Dat is only an application protocol and is agnostic to which network protocols (transports and naming systems) are used. As a result, Dat cannot make the same types of interoperability guarantees as IPFS. +```sh +dat sync /my-dat --http +``` -### How is dat different than Academic Torrents or BitTorrent? +Visit [localhost:8080](http://localhost:8080) to view the latest content. Set the version flag, `localhost:8080/?version=3` to see a specific version. Clicking on a file will download that version of the file (assuming its available locally or on the network). -Academic Torrents [13] uses BitTorrent to share scientific datasets, and BitTorrent has many drawbacks that hinder direct use by scientists. BitTorrent is for sharing static files, that is, files that do not change over time. Dat, on the other hand, has the ability to update and sync files over the peer-to-peer network. BitTorrent is also inefficient at providing random access to data in larger datasets, which is crucial for those who want to get only a piece of a large dataset. BitTorrent comes close to the solution, but we have been able to build something that is more efficient and better designed for the data sharing use case. +We are working on adding a local version history backup in the command line and desktop application. The interfaces for using and checking out older versions will also be further developed. ### Is there a JavaScript or Node.js implementation? @@ -58,7 +113,7 @@ Yes, you'll be able to [install soon](https://datproject.org/install)! See [datp ### Do you plan to have Python or R or other third-party language integrations? -Yes. We are currently developing the serialization format (like .zip archives) called [SLEEP](/sleep) so that third-party libraries can read data without reimplementing all of hyperdrive (which is node-only). +Yes. We are currently developing the serialization format (like .zip archives) called SLEEP so that third-party libraries can read data without reimplementing all of hyperdrive (which is node-only). ### Can multiple people write to one archive? @@ -68,13 +123,19 @@ We are interested in implementations of multi-party writers to dat. Come talk to ## Security & Privacy +### Can other users tell what I am downloading? + +Users only connect to other users with the same dat link. Anyone with a dat link can see other users that are sharing that link and their IP addresses. + +We are thinking more about how to ensure reader privacy. See [this blog post](https://blog.datproject.org/2016/12/12/reader-privacy-on-the-p2p-web/) for more discussion. + ### Is data shared over Dat encrypted? Yes, data shared over Dat is encrypted in transit using the public key (Dat link). When you share a Dat, you must share the public key with another user so they can download it. We use that key on both ends to encrypt the data so both users can read the data but we can ensure the data is not transferred over the internet without encryption. ### Is it possible to discover public keys via man-in-the-middle? -The public key is hashed, creating the discovery key, before we share it over the network. Whenever peers attempt to connect to each other, they use the discovery key. This ensures that the public key is never sent by Dat over the network. +One of the key elements of Dat privacy is that the public key is never used in any discovery network. The public key is hashed, creating the discovery key. Whenever peers attempt to connect to each other, they use the discovery key. Data is encrypted using the public key, so it is important that this key stays secure. @@ -82,6 +143,17 @@ Data is encrypted using the public key, so it is important that this key stays s Only someone with the key can download data for Dat. It is the responsibility of the user that the Dat link is only shared with people who should access the data. The key is never sent over the network via Dat. We do not track keys centrally. It is almost impossible for [keys to overlap](http://docs.datproject.org/faq#are-the-dat-links-guaranteed-to-be-unique-) (and thus to guess keys). +### How can I create stronger privacy protections for my data? + +As long as the public key isn't shared outside of your team, the content will be secure (though the IP addresses and discovery key may become known). You can take a few steps further to improve privacy (generally at the cost of ease of use): + +1. Disable bittorrent DHT discovery (using only DNS discovery), use `--no-dht` flag in CLI. +2. Whitelist IP addresses +3. Run your own discovery servers +4. Encrypt contents before adding to dat (content is automatically encrypted in transit but this would also require decrypting after arrival). + +Only some of these options can be done in the current command line tool. Feel free to PR options to make these easier to configure! + ### How does Dat make sure I download the correct content? Dat uses the concept of a [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) to make sure content is not tampered with. When content is added to a Dat we cryptographically fingerprint it and add it to the tree. On download, we can use the tree to make sure the content has not changed and the parent hashes match. @@ -94,6 +166,22 @@ Dat uses an append-only to track changes over time. An append-only log shows all As a peer to peer network, Dat faces similar privacy risks as Bittorrent. When you download a dataset, your IP address is exposed to the users sharing that dataset. This may lead to honeypot servers collecting IP addresses, as we've seen in Bittorrent. However, with dataset sharing we can create a web of trust model where specific institutions are trusted as primary sources for datasets, diminishing the sharing of IP addresses. [Read more](https://datproject.org/blog/2016-12-18-p2p-reader-privacy) about reader privacy in the p2p web. +## Dat vs ? + +Dat has a lot of overlap with other distributed web tools, data management tools, and distributed version control. Below are some of the most common questions. See more in depth technical comparisons in the [Dat whitepaper](https://github.com/datproject/docs/blob/master/papers/dat-paper.md#5-existing-work). + +### How is Dat different than IPFS? + +IPFS and Dat share a number of underlying similarities but address different problems. Both deduplicate content-addressed pieces of data and have a mechanism for searching for peers who have a specific piece of data. Both have implementations which work in modern Web browsers, as well as command line tools. + +The two systems also have a number of differences. Dat keeps a secure version log of changes to a dataset over time which allows Dat to act as a version control tool. The type of Merkle tree used by Dat lets peers compare which pieces of a specific version of a dataset they each have and efficiently exchange the deltas to complete a full sync. It is not possible to synchronize or version a dataset in this way in IPFS without implementing such functionality yourself, as IPFS provides a CDN and/or filesystem interface but not a synchronization mechanism. In short, IPFS provides distribution of objects, Dat provides synchronization of datasets. + +In order for IPFS to provide guarantees about interoperability, IPFS applications must use only the IPFS network stack. In contrast, Dat is only an application protocol and is agnostic to which network protocols (transports and naming systems) are used. As a result, Dat cannot make the same types of interoperability guarantees as IPFS. + +### How is dat different than Academic Torrents or BitTorrent? + +Academic Torrents [13] uses BitTorrent to share scientific datasets, and BitTorrent has many drawbacks that hinder direct use by scientists. BitTorrent is for sharing static files, that is, files that do not change over time. Dat, on the other hand, has the ability to update and sync files over the peer-to-peer network. BitTorrent is also inefficient at providing random access to data in larger datasets, which is crucial for those who want to get only a piece of a large dataset. BitTorrent comes close to the solution, but we have been able to build something that is more efficient and better designed for the data sharing use case. + ## Under the Hood ### Is Dat different from hyperdrive? @@ -104,7 +192,7 @@ Dat uses hyperdrive and a variety of other modules. Hyperdrive and Dat are compa ### What if I don't want to download all the data? Does dat have an index? -Yes, you can tell Dat to only download the data you want using our Node.js API. You can do this by using `sparse` mode in `dat-node`, which make it only download content that the peer asks for. To do this, simply pass `{sparse: true}` when you create the dat (or hyperdrive): +Yes, you can tell Dat to only download the data you want using our Node.js API. You can do this by using `sparse` mode in `hyperdrive` or `dat-node`, which make it only download content that the peer asks for. To do this, simply pass `{sparse: true}` when you create the dat (or hyperdrive): ```js var Dat = require('dat-node') @@ -113,16 +201,8 @@ Dat(dir, {sparse: true}, function (dat) { }) ``` - ### Does Dat use WebRTC? Dat can use WebRTC but it's very experimental. You can check out our tutorial on using [Dat in the browser here](/browser) We implemented a prototype web version using WebRTC. Moving forward, we are not planning on immediately supporting WebRTC in any Dat application because of reliability issues and lack of support in non-browser environments. Our future browser implementations of Dat will use websockets to transfer data to non-browser Dat interfaces. - -### Dat on the CLI isn't connecting, how do I debug? - -1. Try running `dat doctor` and following the instructions -2. Try running your command with `DEBUG=discovery* ` in front, e.g. `DEBUG=discovery* dat sync` - -When reading debug output, look for `inbound connection` (means someone else successfully connected to you) or `onconnect` (you successfully connected to someone else). `discovery peer=` messages mean you found a candidate and will try to connect to them. diff --git a/docs/how-dat-works.md b/docs/how-dat-works.md index 959013d..fe40a4d 100644 --- a/docs/how-dat-works.md +++ b/docs/how-dat-works.md @@ -1,6 +1,8 @@ # How Dat Works -Note this is about Dat 1.0 and later. For historical info about earlier incarnations of Dat (Alpha, Beta) check out [this post](http://dat-data.com/blog/2016-01-19-brief-history-of-dat). +Note this is about Dat 2.0 and later. For historical info about earlier incarnations of Dat (Alpha, Beta) check out [this post](http://dat-data.com/blog/2016-01-19-brief-history-of-dat). + +Read the [dat whitepaper](https://github.com/datproject/docs/tree/master/papers) for technical details. When someone starts downloading data with the [Dat command-line tool](https://github.com/datproject/dat), here's what happens: diff --git a/docs/install.md b/docs/install.md deleted file mode 100644 index 66304a5..0000000 --- a/docs/install.md +++ /dev/null @@ -1,18 +0,0 @@ -## Dat Desktop - -| MacOS | [Download](https://github.com/datproject/dat-desktop/releases/download/1.0.3/dat-desktop-1.0.3.dmg) | -|---------|-------------------| -| Linux | [Build from Source](http://github.com/datproject/dat-desktop) | -| Windows | Coming Soon | - -## In the terminal - -``` -npm install -g dat -``` - -Requires [node 4](http://node.js) or later. If you have a problem with permissions, [see this guide](https://docs.npmjs.com/getting-started/fixing-npm-permissions). Any other problems, see [troubleshooting](/dat#troubleshooting) or the [FAQ](/faq). - -## Node.js - -See our [node.js library](http://github.com/datproject/dat-node), our underlying protocl layer [hyperdrive](http://github.com/mafintosh/hyperdrive) and the [ecosystem guide](https://docs.datproject.org/ecosystem) for a variety of ways to build dat-compatible applications using Node. diff --git a/docs/intro.md b/docs/intro.md new file mode 100644 index 0000000..bcf78ff --- /dev/null +++ b/docs/intro.md @@ -0,0 +1,47 @@ +# Welcome to Dat Docs! + +Dat is the distributed data tool. + +Dat's open source applications offer a new experience in advanced file syncing and publishing. Wherever your data goes, Dat uses innovative *in place archiving* to link files from many locations together. Share data with anyone over a distributed network using encrypted connections. Dat brings a new ease to public data management with automatic version history, persistent links, and dynamic storage. + +Use Dat to distribute scientific data, browse remote files on demand, or run continuous file archiving. Integrate into your existing work flow with flexible storage options and http publishing. Dat connects existing web infrastructure with a modern technological foundation. Built on a decentralized network, Dat creates new opportunities for existing data publishing tools. Put data preservation at your finger tips, like never before, with user-first applications. **Secure**, **distributed**, **fast**. + +Install Dat Desktop +Install dat command line + +**Built for the Public Good** + +Dat's distributed team builds Dat openly and compassionately. All software is open source and freely available to use. The [Dat project](http://datproject.org) is led by [Code for Science & Society](http://codeforscience.org) (CSS), a U.S. nonprofit. The mission of CSS is to work with public institutions to produce open source infrastructure for researchers, civic hackers, and journalists. We want to improve data access and long-term preservation. We actively welcome outside contributors and use cases beyond our mission. + +Code for Science & Society hosts other open science initiatives including [Science Fair](https://github.com/codeforscience/sciencefair/), a desktop science library like nothing before, and [Stencila](https://github.com/stencila), the office suite for reproducible research. Science Fair uses Dat to distribute scientific literature. In the future, Stencila will use Dat for reproducible data analysis. + +**Get in touch:** + +* [github.com/datproject](http://github.com/datproject) +* [@dat_project](http://twitter.com/dat_project) +* Chat in [#dat on IRC](http://webchat.freenode.net/?channels=dat) or via [gitter](https://gitter.im/datproject/discussions) + +## Getting Started + +If you are new to Dat, welcome! You can learn more about Dat concepts in [the overview](overview). Becoming familiar with core Dat concepts will help you when using Dat and reading our documentation. + +If you are ready to get started, pick a Dat client and install! + +## Features + +* **Secure** - Dat encrypts data transfers and verifies content on arrival. Dat prevents third-party access to metadata and content. [Learn more](faq#security-and-privacy) about security & privacy. +* **Distributed** - Connect directly to other users sharing or downloading common datasets. Any device can share files without need for centralized servers. [Read more](terms#distributed-web) about the distributed web. +* **Fast** - Share files instantly with in-place archiving. Download only the files you want. Quickly sync updates by only downloading new data, saving time and bandwidth. +* **Transparent** - A complete version history improves transparency and auditability. Changes are written in append-only logs and uniformly shared throughout the network. +* **Future-proof** - Persistent links identify and verify content. These unique ids allow users to host copies, boosting long-term availability without sacrificing provenance. + +## Installation + + View the [installation guide](http://datproject.org/install) or pick your favorite client application: + +Install Dat Desktop +Install dat command line + +* [Beaker Browser](http://beakerbrowser.com) - An experimental p2p browser with built-in support for the Dat protocol. +* [Dat Protocol](https://www.datprotocol.com) - Build your own application on the Decentralized Archive Transport (Dat) protocol. +* [require('dat')](http://github.com/datproject/dat-node) - Node.js library for downloading and sharing Dat archives. diff --git a/docs/overview.md b/docs/overview.md new file mode 100644 index 0000000..fd70789 --- /dev/null +++ b/docs/overview.md @@ -0,0 +1,91 @@ +# Dat Concept Overview + +This overview will introduce you to Dat, a new type of distributed data tool, and help you make the most of it. Dat uses modern cryptography, decentralized networks, and content versioning so you can share and publish data with ease. + +With Dat, we want to make data sharing, publishing, and archiving fit into your workflow. Build with the needs of researchers, librarians, and developers in mind, Dat's unique design works wherever you store your data. You can keep files synced whether they're on your laptop, a public data repository, or in a drawer of hard drives. Dat securely ties all these places together, creating a dynamic data archive. Spend less time managing files, more time digging into data (unfortunately we cannot sort your hard drive drawer, yet). + +**Install Dat now.** Then you can play with Dat while learning how it works! + +Install Dat Desktop +Install dat command line + +## In Place Archiving + +You can turn any folder on your computer into *a dat*. We call this *in place archiving*. A dat is a regular folder with some magic attached. The magic is a set of metadata files, in a `.dat` folder. Dat uses the metadata to track file history and securely share your files. Your files and the `.dat` folder can be instantly synced to anywhere. + +Create a dat with any folder + +Once installing Dat, you can use a single command to live sync your files to friends, backup to an external drive, and publish to a website (so people can download over http too!). The cool part is this all happens at the same time. If you go offline for a bit, no worries. Dat shares the latest files and any saved history once you are back online. These data transfers happen between the computers, forgoing any centralized source. + +In place archiving in Dat really means **any place**. Dat seamlessly syncs your files where you want and when you want. Dat's decentralized technology and automatic versioning will improve data availability and data quality without sacrificing ease of use. + +## Distributed Network + +Dat goes beyond regular archiving through it's *distributed network*. When you share data, Dat sends data to many download locations at once, and they can sync the same data with each other! By connecting users directly Dat transfers files faster, especially sharing on a local network. Distributed syncing allows robust global archiving for public data. + +Share unique dat link + +To maintain privacy, the dat link controls access to your data. Any data shared in the network is encrypted using your link as the password. Learn more about Dat's securtiy and privacy below or in [the faqs](faq#security-and-privacy). We are also investigating ways to improve [reader privacy](https://blog.datproject.org/2016/12/12/reader-privacy-on-the-p2p-web/) for public data. + +## Version History + +Dat automatically maintains a built in version history whenever files are added. Dat uses this history to allow partial downloads of files, for example only getting the latest files. There are two types of versioning performed automatically by Dat. Metadata is stored in a folder called `.dat` in the main folder of a repository, and data is stored as normal files in the main folder. + +Dat uses append-only registers to store version history. This means all changes are written to the end of the file, growing over time. + +### Metadata Versioning + +Dat acts as a one-to-one mirror of the state of a folder and all it's contents. When importing files, Dat grabs the filesystem metadata for each file and checks if there is already an entry for this filename. If the file with this metadata matches exactly the newest version of the file metadata stored in Dat, then this file will be skipped (no change). + +If the metadata differs or does not exist, then this new metadata entry will be appended as the new 'latest' version for this file in the append-only SLEEP metadata content register. + +### Content Versioning + +The metadata only tells you if or when a file is changed, now how it changed. In addition to the metadata, Dat tracks changes in the content in a similar manner. + +The default storage system used in Dat stores the files as files. This has the advantage of being very straightforward for users to understand, but the downside of not storing old versions of content by default. + +In contrast to other version control systems, like Git, Dat only stores the current set of files, not older versions. Git, for example, stores all previous content versions and all previous metadata versions in the `.git` folder. But Dat is designed for larger datasets. + +Storing all history on content could easily fill up the users hard drive. Dat has multiple storage modes based on usage. With Dat's dynamic storage, you can store the content history on a local external hard drive or on a remote server (or both!). + +## Dat Privacy + +Files shared with Dat are encrypted (using the link) so *only* users with your unique link can access your files. The link acts as a kind of password meaning, generally, you should assume *anyone* with the link will have access to your files. + +The link allows users to download, and re-share, your files, whether you intended them to have the link or not (with some hand waiving assumptions about them being able to connect to you, which can be limited, see more in [security & privacy faq](faq#security-and-privacy)). + +Make sure you are thoughtful about who you share links with and how. Dat ensures links cannot be intercepted through the Dat network. If you share your links over other channels, ensure the privacy & security matches or exceeds your data security needs. We try to limit times when Dat displays full links to avoid accidental sharing. + +## dat:// links + +Dat links have some special properties that are helpful to understand. + +Traditionally, http links point to a specific server, e.g. datproject.org's server, and/or a specific resource on that server. Unfortunately, links often break or the content changes without notification (this makes it impossible to cite `nytimes.com`, for example, because the link is meaningless without a reference to what content was there at citation time). + +You may have seen Dat links around: + +``` +dat://ff34725120b2f3c5bd5028e4f61d14a45a22af48a7b12126d5d588becde88a93 +``` + +What is with the weird long string of characters? Let's break it down! + +**`dat://` - the protocol** + +The first part of the link is the link protocol, Dat (read about the Dat protocol at [datprotocol.com](http://www.datprotocol.com)). The protocol describes what "language" the link is in and what type of applications can open it. You do not always need this part with Dat but it is helpful context. + +**`ff34725120b2f3c5bd5028e4f61d14a45a22af48a7b12126d5d588becde88a93` - the unique identifier** + +The second part of the link is a 64-character hex strings ([ed25519 public-keys](https://ed25519.cr.yp.to/) to be precise). Each Dat archive gets a public key link to identify it. With the hex string as a link we can a few things: + +1. Encrypt the data transfer +2. Create a persistent identifier, an ID that never changes, even as file are updated (as opposed to a checksum which is based on the file contents). + +**`dat://ff34725120b2f3c5bd5028e4f61d14a45a22af48a7b12126d5d588becde88a93`** + +All together, the links can be thought of similarly to a web URL, as a place to get content, but with some extra special properties. When you download a dat link: + +1. You do not have to worry about where the files are stored. +2. You can always get the latest files available. +3. You can view the version history or add version numbers to links to get an permanent link to a specific version. diff --git a/docs/terms.md b/docs/terms.md index 7ce0ff4..a8aa52e 100644 --- a/docs/terms.md +++ b/docs/terms.md @@ -1,54 +1,56 @@ -## General Terminology +# Terminology -### Dat archive +## Dat Terms -A folder containing files of any type, which can be synced to other people on the distributed web. A Dat archive has content (files) and metadata, both shared to peers. +Terms specific to the Dat software. -A Dat archive has a Dat link used to share with other people. +### dat, Dat archive, archive -### Distributed Web +A dat, or Dat archive, is a set of files and dat metadata (see [SLEEP](#sleep)). A dat folder can contain files of any type, which can be synced to other users. A dat has a Dat link used to share with other people. -In a Distributed Web (P2P) model, those who are downloading the data are also providing some of the bandwidth and storage to run it. Instead of one server, we have many. The more people or organizations that are involved in the Distributed Web, the more redundant, safe, and fast it will become. +When you create a dat, you're creating a `.dat` folder to hold the metadata and the dat keys (a public and secret key). -Currently, the Web is centralized: if someone controls the hardware or the communication line, then they control all the uses of that website. [Read more here](http://brewster.kahle.org/2015/08/11/locking-the-web-open-a-call-for-a-distributed-web-2/). +### Dat Link or Dat Key -### Peer to Peer (P2P) +Identifier for a dat, e.g. `dat://ab3ed4f...`. These are 64 character hashes with the `dat://` protocol prefix. Anyone with the Dat link can download and re-share files in a dat. -A P2P software program searches for other connected computers on a P2P network to locate the desired content. The peers of such networks are end-user computer systems that are interconnected via the Internet. +### Secret Key -In Dat, peers only connect if they both have the same Dat link. +Dat links are the public part of a key pair. Users that have the secret key are able to write updates to a dat. -### Peer +With the Dat CLI and Desktop application, secret keys are stored in a dat folder in your home directory, `~/.dat/secret_keys`. It is important to back these up if you get a new computer. -Another user or server who has downloaded the data (or parts of it) and is uploading it to others in the Dat Swarm. +### Writer -### Swarm or Network +User who can write to a Dat archive. This user has the secret key, allowing them to write data. Currently, dats are single-writer. -A group of peers that want or have downloaded data for a Dat archive and are connected to each other over the Distributed Web. +### Collaborator -### Owner +User who are granted read access to a Dat archive by the owner. A collaborator can access a Dat archive if the owner or another collaborator sends the the Dat link. -User who owns a Dat archive. This user has the secret key on local machine allowing them to write data. +In the future, users will be able to grant collaborators write access to the Dat archive, allowing them to modify and create files in the archive. -### Collaborator +### Swarm or Network -User who are granted read access to a Dat archive by the owner. A collaborator can access a Dat archive if the owner or another collaborator sends the the Dat link. +A group of peers that want or have downloaded data for a Dat archive and are connected to each other over the Distributed Web. -In the future, owners will be able to grant collaborators write access to the Dat archive, allowing them to modify and create files in the archive. +## General Terms -### Secure Register +### Distributed Web -A [register]( https://gds.blog.gov.uk/2015/09/01/registers-authoritative-lists-you-can-trust/) is an authoritative list of information you can trust. We maintain an open register called [Dat Folder](https://datproject.org) which contains public data, and is open to everyone. +In a Distributed Web (P2P) model, those who are downloading the data also provide bandwidth and storage to share it. Instead of one server, we have many. The more people or organizations that are involved in the Distributed Web, the more redundant, safe, and fast it will become. -### Dat Link +Currently, the Web is centralized: if someone controls the hardware or the communication line, then they control all the uses of that website. [Read more here](http://brewster.kahle.org/2015/08/11/locking-the-web-open-a-call-for-a-distributed-web-2/). -Identifier for a Dat archive, e.g. `dat://ab3ed4f...`. These are 64 character hashes with the `dat://` protocol prefix. Anyone with the Dat link can download and re-share files in a Dat archive. +### Peer to Peer (P2P) -### Snapshot Archive +A P2P software program searches for other connected computers on a P2P network to locate the desired content. The peers of such networks are end-user computer systems that are interconnected via the Internet. -A snapshot archive uses a content-based hash as the Dat link. This means that the link is unique for that set of files and content. Once the content changes, the link will change. +In Dat, peers only connect if they both have the same Dat link. -Snapshot archives can be used as checkpoints or for publishing specific versions of datasets with guarantees that the content will not change. +### Secure Register + +A [register]( https://gds.blog.gov.uk/2015/09/01/registers-authoritative-lists-you-can-trust/) is an authoritative list of information you can trust. We maintain an open register called [Dat Folder](https://datproject.org) which contains public data, and is open to everyone. ### Beaker @@ -56,15 +58,13 @@ The [Beaker Browser](https://beakerbrowser.com/) is an experimental p2p browser ## Technical Terms -### Metadata - -Like an HTTP header, the metadata contains a pointer to the contents of Dat and the file list. +### SLEEP -The metadata is a hypercore feed. +SLEEP is the the on-disk format that Dat produces and uses. It is a set of 9 files that hold all of the metadata needed to list the contents of a Dat repository and verify the integrity of the data you receive. -### Content Feed +The acronym SLEEP is a slumber related pun on REST and stands for Syncable Lightweight Event Emitting Persistence. The Event Emitting part refers to how SLEEP files are append-only in nature, meaning they grow over time and new updates can be subscribed to as a realtime feed of events through the Dat protocol. -The content feed is a hypercore feed containing the file contents for a Dat archive. The content feed together with a metadata feed make a Dat archive. +Read the full [SLEEP specification](https://github.com/datproject/docs/blob/master/papers/dat-paper.md#3-sleep-specification) in the dat whitepaper. ### Key @@ -78,26 +78,28 @@ The public key is the key that is shared in the Dat Link. Messages are signed us The discovery key is a hashed public key. The discovery key is used to find peers on the public key without exposing the original public key to network. -### Feed / Core Feed +### Feed -A feed is a term we use interchangeably with the term "append-only log". It’s the lowest level component of Dat. +A feed is a term we use interchangeably with the term "append-only log". It’s the lowest level component of Dat. For each Dat, there are two feeds - the metadata and the content. -Feeds are created with hypercore. +Feeds are created with hypercore. -### Hyperdrive +### Metadata Feed -[Hyperdrive](https://github.com/mafintosh/hyperdrive) is peer to peer directories. We built hyperdrive to efficiently share scientific data in real time between research institutions. Hyperdrive handles the distribution of files while encrypting data transfer and ensuring content integrity. Hyperdrive creates append-only logs for file changes allow users to download partial datasets and to create versioned data. Hyperdrive is built on hypercore. +Like an HTTP header, the metadata contains a pointer to the contents of Dat and the file list. -Archives created with hyperdrive are made with two feeds, one for the metadata and one for the content. A hyperdrive instance can store any number of archives. +The metadata is a hypercore feed. The first entry in the metadata feed is the key for the content feed. -### Hypercore +### Content Feed -[Hypercore](https://github.com/mafintosh/hypercore) is a protocol and network for distributing and replicating feeds of binary data. This creates an efficient gossip network where latency is reduced to a minimum. Hypercore is an eventually consistent, highly available, partition tolerant system. +The content feed is a hypercore feed containing the file contents for a Dat archive. The content feed together with a metadata feed make a Dat archive. -Hypercore instances can contain any number of feeds. +### Hyperdrive -### Hyper- (modules) +[Hyperdrive](https://github.com/mafintosh/hyperdrive) is peer to peer directories. We built hyperdrive to efficiently share scientific data in real time between research institutions. Hyperdrive handles the distribution of files while encrypting data transfer and ensuring content integrity. Hyperdrive creates append-only logs for file changes allow users to download partial datasets and to create versioned data. Hyperdrive is built on hypercore. + +Archives created with hyperdrive are made with two feeds, one for the metadata and one for the content. -Modules that are use hyperdrive archives or hypercore feeds in a cross-compatible way, for example [hyperdiscovery](https://github.com/karissa/hyperdiscovery) or [hyperhealth](https://github.com/karissa/hyperhealth). +### Hypercore -If a module is only compatible with one one of hyperdrive or hypercore, they should be prefixed with that name, e.g. [hyperdrive-import-files](https://github.com/juliangruber/hyperdrive-import-files). +[Hypercore](https://github.com/mafintosh/hypercore) is a protocol and network for distributing and replicating feeds of binary data. This creates an efficient gossip network where latency is reduced to a minimum. Hypercore is an eventually consistent, highly available, partition tolerant system. diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 0000000..f298a81 --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,92 @@ +# Troubleshooting + +We've provided some troubleshooting tips based on issues users have seen. Please [open an issue](https://github.com/datproject/dat/issues/new) or ask us in our [chat room](https://gitter.im/datproject/discussions) if you need help troubleshooting and it is not covered here. + +### Check Your Version + +Knowing the version is really helpful if you run into any bugs, and will help us troubleshoot your issue. + +**In Dat Desktop:** + +Click **Dat** in menu bar (top left) > Click **About Dat**. + +You should see the version number, e.g. `Version 1.1.2 (1.1.2.1076)`. + +**In the Command Line:** + +``` +dat -v +``` + +You should see the Dat semantic version printed, e.g. `13.1.2`. + +## Networking Issues + +All Dat transfers happen directly between computers. Dat has various methods for connecting computers but because networking capabilities vary widely we may have issues connecting. Whenever you run a Dat there are several steps to share or download files with peers: + +1. Discovering other sources +2. Connecting to sources +3. Sending & Receiving Data + +With successful use, Dat will show network counts after connection. If you never see a connection, your network may be restricting discovery or connection. Please try using the dat doctor (see below) between the two computers not connecting. This will help troubleshoot the networks. + +### Dat Doctor + +We've included a tool to identify network issues with Dat, the Dat doctor. The Dat doctor will run two tests: + +1. Attempt to connect to a public server running Dat. +2. Attempt a direct connection between two computers. You will need to run the command on both the computers you are trying to share data between. + +**In Dat Desktop:** + +Our desktop Dat doctor is still in progress, currently you can only test connections to our public server (#1). + +1. View > Toggle Developer Tools +2. Help > Doctor + +You should see the doctor information printed in the console. + +**In the Command Line:** + +Start the doctor by running: + +``` +dat doctor +``` + +For direct connection tests, the doctor will print out a command to run on the other computer, `dat doctor <64-character-string>`. The doctor will run through the key steps in the process of sharing data between computers to help identify the issue. + +### Known Networking Issues + +* Dat may [have issues](https://github.com/datproject/dat/issues/503) connecting if you are using iptables. + +## Installation Troubleshooting + +### Dat Desktop + +TODO + +### Command Line + +To use the Dat command line tool you will need to have [node and npm installed](https://docs.npmjs.com/getting-started/installing-node). Make sure those are installed correctly before installing Dat. Dat only supports Node versions 4 and above. You can check the version of each: + +``` +node -v +npm -v +``` + +#### Global Install + +The `-g` option installs Dat globally allowing you to run it as a command. Make sure you installed with that option. + +* If you receive an `EACCES` error, read [this guide](https://docs.npmjs.com/getting-started/fixing-npm-permissions) on fixing npm permissions. +* If you receive an `EACCES` error, you may also install dat with sudo: `sudo npm install -g dat`. +* Have other installation issues? Let us know, you can [open an issue](https://github.com/datproject/dat/issues/new) or ask us in our [chat room](https://gitter.im/datproject/discussions). + +## Command Line Debugging + +If you are having trouble with a specific command, run with the debug environment variable set to `dat` (and optionally also `dat-node`). This will help us debug any issues: + +``` +DEBUG=dat,dat-node dat clone dat:// dir +``` -- cgit v1.2.3