summaryrefslogtreecommitdiffstats
path: root/plan.txt
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2022-10-26 14:31:39 -0700
committerBryan Newbold <bnewbold@robocracy.org>2022-10-26 14:31:39 -0700
commitc4485e4b134d946227d6cbd52f57ce175387749d (patch)
tree68980a135974539210c913f8d3a6d1e65e167cbb /plan.txt
parentb66f4349a91da353e6c9d5abb0477cc4ba2d571b (diff)
downloadadenosine-c4485e4b134d946227d6cbd52f57ce175387749d.tar.gz
adenosine-c4485e4b134d946227d6cbd52f57ce175387749d.zip
start sketching out a plan, rust deps
Diffstat (limited to 'plan.txt')
-rw-r--r--plan.txt177
1 files changed, 177 insertions, 0 deletions
diff --git a/plan.txt b/plan.txt
new file mode 100644
index 0000000..70a3d53
--- /dev/null
+++ b/plan.txt
@@ -0,0 +1,177 @@
+
+- basic types:
+ DID
+ DID document
+ NSID
+ TID
+ atp URI
+- libraries:
+ chrono
+ data store: sqlite?
+ cbor (dag-cbor)
+ url or uri (for atp URIs?)
+ ucan
+
+- background reading
+ MST (merkle search tree)
+
+- describe datastore needs (schemas)
+
+- missing docs (look in code?)
+ "cbor normalization"
+
+
+## High-Level Architecture of PDS
+
+- has accounts/users
+- manages persistent repos
+- implements all relevant XRPC endpoints
+
+simple implementation would just:
+- statically implement "lexicons" (not flexible/general)
+- static account registrations (or single account!), eg in TOML file
+
+
+## High-level Architecture of CLI
+
+- persist auth info locally
+- cache DID / petnames locally (?)
+- virtually everything is a query to PDS
+- implements XRPC as a client, including methods and types
+
+
+## PDS Datastore Needs
+
+https://github.com/bluesky-social/atproto/blob/main/packages/server/src/db/database-schema.ts
+
+loosely, probably want to have two distinct datastores, even if they end up in
+the same underlying database. one is the raw repository merkle search tree, the
+other is a more semantic set of content.
+
+raw repo can be just key/value with CID (string or bytes) as key and raw CBOR
+bytes as value. there are probably better and worse implementations but this is
+fine on it's own.
+
+the semantic info is probably a set of relational tables; let's call these
+"content tables". may want various flexible indices on top of them.
+
+there are also probably (?) additional caches and system tables for things like
+invite codes. let's call these "service tables".
+
+mostly reads go to the content and system tables. writes would go to the raw
+repo, then update the content tables as needed. all in a single database
+transaction?
+
+
+
+## Crawling
+
+spider:
+- start with a list of DID in frontier
+- foreach in frontier
+ => fetch entire repo
+ => extract list of other DIDs via 'like', 'follow'
+ => add to crawl frontier (which is de-duped)
+
+
+## Summary
+
+User content is stored in per-user "repos", which are analagous to git
+repositories. They are merkle-tree like, and have a series of signed "commits"
+to verify authenticity.
+
+Identities are "DIDs", which are permanent URI-like strings which can somehow
+be securely dereferenced to a user profile which contains a public key. The
+user profile (and key) can change over time. Each DID has a scheme type that
+describes how registration and dereferencing should work. There are several
+proposed schemes but none seem to meet all the requirements for a
+decentralized, low-cost, low-friction system as desired... maybe a better one
+will emerge?
+
+Repos usually live in a hosted service, and user clients communicate over an
+HTTP protocol (instead of mutating the merkle tree directly). The signing key
+is deposited with the hosting service. It is possible to suck out the entire
+merkle tree from hosted services.
+
+
+Short pitch:
+- social media content stored in something like signed git repos
+- currently all content is public and signed (non-refutable)
+- users control a pointer to where repo is currently hosted, and can migrate by
+ copying and pointing somewhere new
+- thin clients don't store full repos for self or others, they just do HTTP RPC
+ calls to host service. this includes things like search, aggregation, counts.
+- application protocols can define new content schemas and client RPC methods
+
+How does it compare to ActivityPub?
+- ATP specifies how user content is *persisted*, and allows migration of content between hosts
+- ActivityPub is about communicating events between hosts
+- likely possible to implement ActivityPub as part of an ATP host
+
+## Thoughts
+
+This isn't really offline-first. Writes can not merge; whichever devices gets
+to the PDS first "wins". There is no merge process.
+
+Wait, I guess this really is just IPLD. Will server-server communication just
+be IPFS stuff (bitswap, graphsync)?
+
+## Issues
+
+Service power:
+
+- could rate-limit crawling/harvesting of repos. eg, Google has an advantage in
+ web crawling today
+- what would the impact be of "faster" cache services where clients can fetch
+ repos from
+
+Usability:
+
+- time delay in conversations/threads. eg, replies from strangers could take a
+ long time to show up
+
+Tech/Features:
+
+- pub/sub style "subscribe" for clients (eg, to receive push notifications from
+ server, instead of polling)
+
+Questions:
+
+- seems like all content is public, signed, and non-refutable. are deletions just publishing a retraction request? is retracted content not served up in the repository?
+- how will media blobs work? eg, video. migrations, bandwidth, take-downs.
+- can applications (protocols/lexicons) specify new server-to-server RPC methods? or is sync effectively the only server-to-server method needed?
+- reader privacy: guess that PDS can see a lot about reader, but depending on size can partially shield who is reading what externally?
+- should there be a mechanism for sharing client state? eg, what content/notifications have been "seen" across devices
+- how would something like a 140-character limit be expressed in a Lexicon? seems like an extra computational predicate or validation on top of the schema
+- are Lexicons versioned, or immutable over time?
+- shouldn't it be atp://
+
+
+## Another Plan
+
+- schema generation from JSON schemas; just models, or RPC as well?
+
+PDS (adenosine-pds):
+- use existing `ipfs-sqlite-block-store` to store repo as IPLD DAG
+- basic MST implementation layer on top of IPLD store
+
+CLI (adenosine):
+- base atproto only (?)
+- account management
+ => secret store for revocation key?
+- some kind of DID address book / contacts?
+- generic CRUD and list commands working with at:// URIs
+
+
+## Lexicon Ideas
+
+- journal publishing
+- tumblr-style microblogging
+
+## TODO
+
+- rust library for jq-like JSON queries ("JSON Pointers", https://datatracker.ietf.org/doc/html/rfc6901)
+- actually read/understand did-web spec
+- try to understand how small-world reference and notification would actually work
+ => PDS pulls in everything via follow/following graph, and spiders +1?
+ => query aggregators for references and pull any in?