plan.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177


- basic types:
    DID
    DID document
    NSID
    TID
    atp URI
- libraries:
    chrono
    data store: sqlite?
    cbor (dag-cbor)
    url or uri (for atp URIs?)
    ucan

- background reading
    MST (merkle search tree)

- describe datastore needs (schemas)

- missing docs (look in code?)
    "cbor normalization"


## High-Level Architecture of PDS

- has accounts/users
- manages persistent repos
- implements all relevant XRPC endpoints

simple implementation would just:
- statically implement "lexicons" (not flexible/general)
- static account registrations (or single account!), eg in TOML file


## High-level Architecture of CLI

- persist auth info locally
- cache DID / petnames locally (?)
- virtually everything is a query to PDS
- implements XRPC as a client, including methods and types


## PDS Datastore Needs

https://github.com/bluesky-social/atproto/blob/main/packages/server/src/db/database-schema.ts

loosely, probably want to have two distinct datastores, even if they end up in
the same underlying database. one is the raw repository merkle search tree, the
other is a more semantic set of content.

raw repo can be just key/value with CID (string or bytes) as key and raw CBOR
bytes as value. there are probably better and worse implementations but this is
fine on it's own.

the semantic info is probably a set of relational tables; let's call these
"content tables". may want various flexible indices on top of them.

there are also probably (?) additional caches and system tables for things like
invite codes. let's call these "service tables".

mostly reads go to the content and system tables. writes would go to the raw
repo, then update the content tables as needed. all in a single database
transaction?


## Crawling

spider:
- start with a list of DID in frontier
- foreach in frontier
    => fetch entire repo
    => extract list of other DIDs via 'like', 'follow'
    => add to crawl frontier (which is de-duped)


## Summary

User content is stored in per-user "repos", which are analagous to git
repositories. They are merkle-tree like, and have a series of signed "commits"
to verify authenticity.

Identities are "DIDs", which are permanent URI-like strings which can somehow
be securely dereferenced to a user profile which contains a public key. The
user profile (and key) can change over time. Each DID has a scheme type that
describes how registration and dereferencing should work. There are several
proposed schemes but none seem to meet all the requirements for a
decentralized, low-cost, low-friction system as desired... maybe a better one
will emerge?

Repos usually live in a hosted service, and user clients communicate over an
HTTP protocol (instead of mutating the merkle tree directly). The signing key
is deposited with the hosting service. It is possible to suck out the entire
merkle tree from hosted services.


Short pitch:
- social media content stored in something like signed git repos
- currently all content is public and signed (non-refutable)
- users control a pointer to where repo is currently hosted, and can migrate by
  copying and pointing somewhere new
- thin clients don't store full repos for self or others, they just do HTTP RPC
  calls to host service. this includes things like search, aggregation, counts.
- application protocols can define new content schemas and client RPC methods

How does it compare to ActivityPub?
- ATP specifies how user content is *persisted*, and allows migration of content between hosts
- ActivityPub is about communicating events between hosts
- likely possible to implement ActivityPub as part of an ATP host

## Thoughts

This isn't really offline-first. Writes can not merge; whichever devices gets
to the PDS first "wins". There is no merge process.

Wait, I guess this really is just IPLD. Will server-server communication just
be IPFS stuff (bitswap, graphsync)?

## Issues

Service power:

- could rate-limit crawling/harvesting of repos. eg, Google has an advantage in
  web crawling today
- what would the impact be of "faster" cache services where clients can fetch
  repos from

Usability:

- time delay in conversations/threads. eg, replies from strangers could take a
  long time to show up

Tech/Features:

- pub/sub style "subscribe" for clients (eg, to receive push notifications from
  server, instead of polling)

Questions:

- seems like all content is public, signed, and non-refutable. are deletions just publishing a retraction request? is retracted content not served up in the repository?
- how will media blobs work? eg, video. migrations, bandwidth, take-downs.
- can applications (protocols/lexicons) specify new server-to-server RPC methods? or is sync effectively the only server-to-server method needed?
- reader privacy: guess that PDS can see a lot about reader, but depending on size can partially shield who is reading what externally?
- should there be a mechanism for sharing client state? eg, what content/notifications have been "seen" across devices
- how would something like a 140-character limit be expressed in a Lexicon? seems like an extra computational predicate or validation on top of the schema
- are Lexicons versioned, or immutable over time?
- shouldn't it be atp://


## Another Plan

- schema generation from JSON schemas; just models, or RPC as well?

PDS (adenosine-pds):
- use existing `ipfs-sqlite-block-store` to store repo as IPLD DAG
- basic MST implementation layer on top of IPLD store

CLI (adenosine):
- base atproto only (?)
- account management
    => secret store for revocation key?
- some kind of DID address book / contacts?
- generic CRUD and list commands working with at:// URIs


## Lexicon Ideas

- journal publishing
- tumblr-style microblogging

## TODO

- rust library for jq-like JSON queries ("JSON Pointers", https://datatracker.ietf.org/doc/html/rfc6901)
- actually read/understand did-web spec
- try to understand how small-world reference and notification would actually work
    => PDS pulls in everything via follow/following graph, and spiders +1?
    => query aggregators for references and pull any in?