From 084e476957ce80b456dcf0575de4efc7331d34f9 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 4 Jan 2019 17:41:27 -0800 Subject: clean up notes a tiny bit --- notes/ideas/bot_tools.txt | 17 +++++++++++++++++ notes/ideas/domains.txt | 5 +++++ notes/ideas/more_api_patterns.txt | 15 +++++++++++++++ notes/ideas/thoughts.txt | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 69 insertions(+) create mode 100644 notes/ideas/bot_tools.txt create mode 100644 notes/ideas/domains.txt create mode 100644 notes/ideas/more_api_patterns.txt create mode 100644 notes/ideas/thoughts.txt (limited to 'notes/ideas') diff --git a/notes/ideas/bot_tools.txt b/notes/ideas/bot_tools.txt new file mode 100644 index 00000000..cf465bde --- /dev/null +++ b/notes/ideas/bot_tools.txt @@ -0,0 +1,17 @@ + +Could be helpful for writing bots for import: + +metafacture: large/popular java framework for pipelines and munging library +metadata. + + https://github.com/metafacture/metafacture-core/wiki + +catmandu: large/popular set of perl libraries for munging bibliographic +metadata, including a DSL ("Fix"). Can also push/pull to backends. + +miku/siskin: luigi and higher-level tool for running regular tasks. + + https://github.com/miku/span + +miku/span: golang lower-level tools for parsing and normalizing specific +formats (including KBART, DOAJ). diff --git a/notes/ideas/domains.txt b/notes/ideas/domains.txt new file mode 100644 index 00000000..8556494e --- /dev/null +++ b/notes/ideas/domains.txt @@ -0,0 +1,5 @@ + +Many obvious domains and hacks are taken. Would love to get fatcat.org; for now +registered fatcat.wiki. + +fatca.tt is available. diff --git a/notes/ideas/more_api_patterns.txt b/notes/ideas/more_api_patterns.txt new file mode 100644 index 00000000..ca61ac81 --- /dev/null +++ b/notes/ideas/more_api_patterns.txt @@ -0,0 +1,15 @@ + +If returning a long list (eg, all releases for a container): + + "releases": { + "data": [ + , + , + ... + ], + "has_mode": true, + "total_count": 100, + "url": "/v0/container/asdf/releases" + } + +This pattern from the Stripe API. diff --git a/notes/ideas/thoughts.txt b/notes/ideas/thoughts.txt new file mode 100644 index 00000000..c01c0d37 --- /dev/null +++ b/notes/ideas/thoughts.txt @@ -0,0 +1,32 @@ + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + +## Scaling database + +Two scaling issues: size of database due to edits (likely billions of rows) and +desire to do complex queries/reports ("analytics"). The later is probably not a +concern, and could be handled by dumping and working on a cluster (or secondary +views, etc). So just a distraction? Simpler to have all rolled up. + +Cockroach is postgres-like; might be able to use that for HA and scaling? +Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk +import performance (one-time?). + +Using elastic for most (eg, non-logged-in) views could keep things fast. + +Cockroach seems more resourced/polished than TiDB? -- cgit v1.2.3