diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-01-08 16:28:27 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-01-08 16:28:27 -0800 |
commit | 16f2e78298dbd2231f5f337ea17c89a6a131a052 (patch) | |
tree | 6e72581e625e73c97cbab72d0f9c35665c99e5d7 /notes/ideas/thoughts.txt | |
parent | eb40a5f274f3608db34309cfd16739a7642ef5e7 (diff) | |
parent | ffb721f90c5d97ee80885209bf45feb85ca9625c (diff) | |
download | fatcat-16f2e78298dbd2231f5f337ea17c89a6a131a052.tar.gz fatcat-16f2e78298dbd2231f5f337ea17c89a6a131a052.zip |
Merge branch 'bnewbold-crude-auth'
Fixed a conflict in:
python/fatcat_export.py
Diffstat (limited to 'notes/ideas/thoughts.txt')
-rw-r--r-- | notes/ideas/thoughts.txt | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/notes/ideas/thoughts.txt b/notes/ideas/thoughts.txt new file mode 100644 index 00000000..c01c0d37 --- /dev/null +++ b/notes/ideas/thoughts.txt @@ -0,0 +1,32 @@ + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + +## Scaling database + +Two scaling issues: size of database due to edits (likely billions of rows) and +desire to do complex queries/reports ("analytics"). The later is probably not a +concern, and could be handled by dumping and working on a cluster (or secondary +views, etc). So just a distraction? Simpler to have all rolled up. + +Cockroach is postgres-like; might be able to use that for HA and scaling? +Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk +import performance (one-time?). + +Using elastic for most (eg, non-logged-in) views could keep things fast. + +Cockroach seems more resourced/polished than TiDB? |