diff options
| author | Bryan Newbold <bnewbold@robocracy.org> | 2018-04-11 15:31:06 -0700 | 
|---|---|---|
| committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-04-11 15:31:06 -0700 | 
| commit | 4b521b5eb0eb0843e365ac063cb04706d1cb674a (patch) | |
| tree | 51a898579f1e1e11ce1261bcae205a41dab8840a | |
| parent | 229b22cedf786d55af210c806864459b29c1b27d (diff) | |
| download | fatcat-4b521b5eb0eb0843e365ac063cb04706d1cb674a.tar.gz fatcat-4b521b5eb0eb0843e365ac063cb04706d1cb674a.zip | |
docs update
| -rw-r--r-- | README.md | 5 | ||||
| -rw-r--r-- | TODO | 37 | ||||
| -rw-r--r-- | notes/thoughts.txt | 32 | 
3 files changed, 40 insertions, 34 deletions
| @@ -15,9 +15,10 @@ This is just a concept for now; see [rfc](./rfc).  Use `pipenv` (which you can install with `pip`). -    pipenv shell +    pipenv run run.py      python fatcat/api.py  Run tests: -    pipenv run nosetests3 fatcat +    pipenv run run.py --init-db +    pipenv run pytest @@ -1,34 +1,7 @@ -Should probably just UUID all the (public) ids. +update public pointer tables: +- 'live' boolean (default false) +- redirect column (to self) -Instead of having a separate id pointer table, could have an extra "mutable" -public ID column (unique, indexed) on entity rows. Backend would ensure the -right thing happens. Changelog tables (or special redirect/deletion tables) -would record changes and be "fallen through" to. - -Instead of having merge redirects, could just point all identifiers to the same -revision (and update them all in the future). Don't need to recurse! Need to -keep this forever though, could scale badly if "aggregations" get merged. - -Redirections of redirections should probably simply be disallowed. - -"Deletion" is really just pointing to a special or null entity. - -Trade-off: easy querying for common case (wanting "active" rows) vs. robust -handling of redirects (likely to be pretty common). Also, having UUID handling -across more than one table. - -## Scaling database - -Two scaling issues: size of database due to edits (likely billions of rows) and  -desire to do complex queries/reports ("analytics"). The later is probably not a -concern, and could be handled by dumping and working on a cluster (or secondary -views, etc). So just a distraction? Simpler to have all rolled up. - -Cockroach is postgres-like; might be able to use that for HA and scaling? -Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk -import performance (one-time?). - -Using elastic for most (eg, non-logged-in) views could keep things fast. - -Cockroach seems more resourced/polished than TiDB? +later: +- public IDs are UUID (sqlite hack?) diff --git a/notes/thoughts.txt b/notes/thoughts.txt new file mode 100644 index 00000000..c01c0d37 --- /dev/null +++ b/notes/thoughts.txt @@ -0,0 +1,32 @@ + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + +## Scaling database + +Two scaling issues: size of database due to edits (likely billions of rows) and  +desire to do complex queries/reports ("analytics"). The later is probably not a +concern, and could be handled by dumping and working on a cluster (or secondary +views, etc). So just a distraction? Simpler to have all rolled up. + +Cockroach is postgres-like; might be able to use that for HA and scaling? +Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk +import performance (one-time?). + +Using elastic for most (eg, non-logged-in) views could keep things fast. + +Cockroach seems more resourced/polished than TiDB? | 
