aboutsummaryrefslogtreecommitdiffstats
path: root/docs/concepts.md
blob: 7a1aaf0f9ce4dee99d5b2e3575434bb3eeb270b3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Key Concepts

## In Place Archiving

You can turn any folder on your computer into *a dat*. We call this *in place archiving*. A dat is a regular folder with some magic attached. The magic is a set of metadata files, in a `.dat` folder. Dat uses the metadata to track file history and securely share your files. Your files and the `.dat` folder can be instantly synced to anywhere.

<img src="/assets/dat_folder.png" alt="Create a dat with any folder" style="width:500px;"/>

Once you have installed Dat, you can use a single command to live sync your files to friends, backup to an external drive, and publish to a website (so people can download over http too!). The cool part is this all happens at the same time. If you go offline for a bit, no worries. Dat shares the latest files and any saved history once you are back online. These data transfers happen between the computers, forgoing any centralized source.

In place archiving in Dat really means **any place**. Dat seamlessly syncs your files where you want and when you want. Dat's decentralized technology and automatic versioning will improve data availability and data quality without sacrificing ease of use.

## Distributed Network

Dat goes beyond regular archiving through its *distributed network*. When you share data, Dat sends data to many download locations at once, and they can sync the same data with each other! By connecting users directly Dat transfers files faster, especially sharing on a local network. Distributed syncing allows robust global archiving for public data.

<img src="/assets/share_link.png" alt="Share unique dat link" style="width:500px;"/>

To maintain privacy, the dat link controls access to your data. Any data shared in the network is encrypted using your link as the password. Learn more about Dat's securtiy and privacy below or in [the faqs](faq#security-and-privacy). We are also investigating ways to improve [reader privacy](https://blog.datproject.org/2016/12/12/reader-privacy-on-the-p2p-web/) for public data.

## Version History

Dat automatically maintains a built in version history whenever files are added. Dat uses this history to allow partial downloads of files, for example only getting the latest files. There are two types of versioning performed automatically by Dat. Metadata is stored in a folder called `.dat` in the main folder of a repository, and data is stored as normal files in the main folder.

Dat uses append-only registers to store version history. This means all changes are written to the end of the file, growing over time.

### Metadata Versioning

Dat acts as a one-to-one mirror of the state of a folder and all it's contents. When importing files, Dat grabs the filesystem metadata for each file and checks if there is already an entry for this filename. If the file with this metadata matches exactly the newest version of the file metadata stored in Dat, then this file will be skipped (no change).

If the metadata differs or does not exist, then this new metadata entry will be appended as the new 'latest' version for this file in the append-only SLEEP metadata content register.

### Content Versioning

The metadata only tells you if or when a file is changed, not how it changed. In addition to the metadata, Dat tracks changes in the content in a similar manner.

The default storage system used in Dat stores the files as files. This has the advantage of being very straightforward for users to understand, but the downside of not storing old versions of content by default.

In contrast to other version control systems, like Git, Dat only stores the current set of files, not older versions. Git, for example, stores all previous content versions and all previous metadata versions in the `.git` folder. But Dat is designed for larger datasets.

Storing all history on content could easily fill up the users hard drive. Dat has multiple storage modes based on usage. With Dat's dynamic storage, you can store the content history on a local external hard drive or on a remote server (or both!).

## Dat Privacy

Files shared with Dat are encrypted (using the link) so *only* users with your unique link can access your files. The link acts as a kind of password meaning, generally, you should assume *anyone* with the link will have access to your files.

The link allows users to download, and re-share, your files, whether you intended them to have the link or not (with some hand waiving assumptions about them being able to connect to you, which can be limited, see more in [security & privacy faq](faq#security-and-privacy)).

Make sure you are thoughtful about who you share links with and how. Dat ensures links cannot be intercepted through the Dat network. If you share your links over other channels, ensure the privacy & security matches or exceeds your data security needs. We try to limit times when Dat displays full links to avoid accidental sharing.

## dat:// links

Dat links have some special properties that are helpful to understand.

Traditionally, http links point to a specific server, e.g. datproject.org's server, and/or a specific resource on that server. Unfortunately, links often break or the content changes without notification (this makes it impossible to cite `nytimes.com`, for example, because the link is meaningless without a reference to what content was there at citation time). Dat links, on the other hand, never change. You can update data in a dat and use the same link to download the changes.

Here is an example dat link:

```
dat://ff34725120b2f3c5bd5028e4f61d14a45a22af48a7b12126d5d588becde88a93
```

What is with the weird long string of characters? Let's break it down!

**`dat://` - the protocol**

The first part of the link is the link protocol, Dat (read about the Dat protocol at [datprotocol.com](http://www.datprotocol.com)). The protocol describes what "language" the link is in and what type of applications can open it. You do not always need this part with Dat but it is helpful context.

**`ff34725120b2f3c5bd5028e4f61d14a45a22af48a7b12126d5d588becde88a93` - the unique identifier**

The second part of the link is a 64-character hex strings ([ed25519 public-keys](https://ed25519.cr.yp.to/) to be precise). Each Dat archive gets a public key link to identify it. With the hex string as a link we can a few things:

1. Encrypt the data transfer
2. Create a persistent identifier, an ID that never changes, even as file are updated (as opposed to a checksum which is based on the file contents).

**`dat://ff34725120b2f3c5bd5028e4f61d14a45a22af48a7b12126d5d588becde88a93`**

All together, the links can be thought of similarly to a web URL, as a place to get content, but with some extra special properties. When you download a dat link:

1. You do not have to worry about where the files are stored.
2. You can always get the latest files available.
3. You can view the version history or add version numbers to links to get an permanent link to a specific version.

[If you'd like you read more about how dat works, see our whitepaper.](https://github.com/datproject/docs/blob/master/papers/dat-paper.pdf)