summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--README.md5
-rw-r--r--TODO37
-rw-r--r--notes/thoughts.txt32
3 files changed, 40 insertions, 34 deletions
diff --git a/README.md b/README.md
index fd1e2a2f..b6f31383 100644
--- a/README.md
+++ b/README.md
@@ -15,9 +15,10 @@ This is just a concept for now; see [rfc](./rfc).
Use `pipenv` (which you can install with `pip`).
- pipenv shell
+ pipenv run run.py
python fatcat/api.py
Run tests:
- pipenv run nosetests3 fatcat
+ pipenv run run.py --init-db
+ pipenv run pytest
diff --git a/TODO b/TODO
index 8c7d12fc..5208216f 100644
--- a/TODO
+++ b/TODO
@@ -1,34 +1,7 @@
-Should probably just UUID all the (public) ids.
+update public pointer tables:
+- 'live' boolean (default false)
+- redirect column (to self)
-Instead of having a separate id pointer table, could have an extra "mutable"
-public ID column (unique, indexed) on entity rows. Backend would ensure the
-right thing happens. Changelog tables (or special redirect/deletion tables)
-would record changes and be "fallen through" to.
-
-Instead of having merge redirects, could just point all identifiers to the same
-revision (and update them all in the future). Don't need to recurse! Need to
-keep this forever though, could scale badly if "aggregations" get merged.
-
-Redirections of redirections should probably simply be disallowed.
-
-"Deletion" is really just pointing to a special or null entity.
-
-Trade-off: easy querying for common case (wanting "active" rows) vs. robust
-handling of redirects (likely to be pretty common). Also, having UUID handling
-across more than one table.
-
-## Scaling database
-
-Two scaling issues: size of database due to edits (likely billions of rows) and
-desire to do complex queries/reports ("analytics"). The later is probably not a
-concern, and could be handled by dumping and working on a cluster (or secondary
-views, etc). So just a distraction? Simpler to have all rolled up.
-
-Cockroach is postgres-like; might be able to use that for HA and scaling?
-Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk
-import performance (one-time?).
-
-Using elastic for most (eg, non-logged-in) views could keep things fast.
-
-Cockroach seems more resourced/polished than TiDB?
+later:
+- public IDs are UUID (sqlite hack?)
diff --git a/notes/thoughts.txt b/notes/thoughts.txt
new file mode 100644
index 00000000..c01c0d37
--- /dev/null
+++ b/notes/thoughts.txt
@@ -0,0 +1,32 @@
+
+Instead of having a separate id pointer table, could have an extra "mutable"
+public ID column (unique, indexed) on entity rows. Backend would ensure the
+right thing happens. Changelog tables (or special redirect/deletion tables)
+would record changes and be "fallen through" to.
+
+Instead of having merge redirects, could just point all identifiers to the same
+revision (and update them all in the future). Don't need to recurse! Need to
+keep this forever though, could scale badly if "aggregations" get merged.
+
+Redirections of redirections should probably simply be disallowed.
+
+"Deletion" is really just pointing to a special or null entity.
+
+Trade-off: easy querying for common case (wanting "active" rows) vs. robust
+handling of redirects (likely to be pretty common). Also, having UUID handling
+across more than one table.
+
+## Scaling database
+
+Two scaling issues: size of database due to edits (likely billions of rows) and
+desire to do complex queries/reports ("analytics"). The later is probably not a
+concern, and could be handled by dumping and working on a cluster (or secondary
+views, etc). So just a distraction? Simpler to have all rolled up.
+
+Cockroach is postgres-like; might be able to use that for HA and scaling?
+Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk
+import performance (one-time?).
+
+Using elastic for most (eg, non-logged-in) views could keep things fast.
+
+Cockroach seems more resourced/polished than TiDB?