From 084e476957ce80b456dcf0575de4efc7331d34f9 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 4 Jan 2019 17:41:27 -0800 Subject: clean up notes a tiny bit --- notes/UNSORTED.txt | 40 ++++++++++++++++++++++++++++++++++ notes/bot_tools.txt | 17 --------------- notes/domains.txt | 5 ----- notes/golang.txt | 45 --------------------------------------- notes/ideas/bot_tools.txt | 17 +++++++++++++++ notes/ideas/domains.txt | 5 +++++ notes/ideas/more_api_patterns.txt | 15 +++++++++++++ notes/ideas/thoughts.txt | 32 ++++++++++++++++++++++++++++ notes/more_api_patterns.txt | 15 ------------- notes/thoughts.txt | 32 ---------------------------- 10 files changed, 109 insertions(+), 114 deletions(-) create mode 100644 notes/UNSORTED.txt delete mode 100644 notes/bot_tools.txt delete mode 100644 notes/domains.txt delete mode 100644 notes/golang.txt create mode 100644 notes/ideas/bot_tools.txt create mode 100644 notes/ideas/domains.txt create mode 100644 notes/ideas/more_api_patterns.txt create mode 100644 notes/ideas/thoughts.txt delete mode 100644 notes/more_api_patterns.txt delete mode 100644 notes/thoughts.txt (limited to 'notes') diff --git a/notes/UNSORTED.txt b/notes/UNSORTED.txt new file mode 100644 index 00000000..3960f5eb --- /dev/null +++ b/notes/UNSORTED.txt @@ -0,0 +1,40 @@ + +Not allowed to PUT edits to the same entity in the same editgroup. If you want +to update an edit, need to delete the old one first. + +The state depends only on the current entity state, not any redirect. This +means that if the target of a redirect is delted, the redirecting entity is +still "redirect", not "deleted". + +Redirects-to-redirects are not allowed; this is enforced when the editgroup is +accepted, to prevent race conditions. + +Redirects to "work-in-progress" (WIP) rows are disallowed at update time (and +not re-checked at accept time). + +"ident table" parameters are ignored for entity updates. This is so clients can +simply re-use object instantiations. + +The "state" parameter of an entity body is used as a flag when deciding whether +to do non-normal updates (eg, redirect or undelete, as opposed to inserting a +new revision). + +In the API, if you, eg, expand=files on a redirected release, you will get +files that point to the *target* release entity. If you use the /files endpoint +(instead of expand), you will get the files pointing to the redirected entity +(which probably need updating!). Also, if you expand=files on the target +entity, you *won't* get the files pointing to the redirected release. A +high-level merge process might make these changes at the same time? Or at least +tag at edit review time. A sweeper task can look for and auto-correct such +redirects after some delay period. + +=> it would not be too hard to update get_release_files to check for such + redirects; could be handled by request flag? + +`prev_rev` is naively set to the most-recent previous state. If the curent +state was deleted or a redirect, it is set to null. + +This parameter is not checked/enforced at edit accept time (but could be, and +maybe introduce `prev_redirect`, for race detection). Or, could have ident +point to most-recent edit, and have edits point to prev, for firmer control. + diff --git a/notes/bot_tools.txt b/notes/bot_tools.txt deleted file mode 100644 index cf465bde..00000000 --- a/notes/bot_tools.txt +++ /dev/null @@ -1,17 +0,0 @@ - -Could be helpful for writing bots for import: - -metafacture: large/popular java framework for pipelines and munging library -metadata. - - https://github.com/metafacture/metafacture-core/wiki - -catmandu: large/popular set of perl libraries for munging bibliographic -metadata, including a DSL ("Fix"). Can also push/pull to backends. - -miku/siskin: luigi and higher-level tool for running regular tasks. - - https://github.com/miku/span - -miku/span: golang lower-level tools for parsing and normalizing specific -formats (including KBART, DOAJ). diff --git a/notes/domains.txt b/notes/domains.txt deleted file mode 100644 index 8556494e..00000000 --- a/notes/domains.txt +++ /dev/null @@ -1,5 +0,0 @@ - -Many obvious domains and hacks are taken. Would love to get fatcat.org; for now -registered fatcat.wiki. - -fatca.tt is available. diff --git a/notes/golang.txt b/notes/golang.txt deleted file mode 100644 index 404741e8..00000000 --- a/notes/golang.txt +++ /dev/null @@ -1,45 +0,0 @@ - -## Database Schema / ORM / Generation - -start simple, with pg (or sqlx if we wanted to be DB-agnostic): -- pq: basic postgres driver and ORM (similar to sqlalchemy?) -- sqlx: small extensions to builtin sql; row to struct mapping - -debug postgres with gocmdpev - -later, if code is too duplicated, look in to sqlboiler (first) or xo (second): -- https://github.com/xo/xo -- https://github.com/volatiletech/sqlboiler - -later, to do migrations, use goose, or consider alembic (python) for -auto-generation -- https://github.com/steinbacher/goose -- possibly auto-generate with python alembic - -for identifiers, consider either built-in postgres UUID, or: -- https://github.com/rs/xid -- https://github.com/oklog/ulid - like a UUID, but base32 and "sortable" (timestamp + random) - -## API In General - -Hope to use Kong for authentication. - -start with oauth2... orcid? - -## OpenAPI/Swagger - -go-swagger (OpenAPI 2.0): -- generate initial API server skeleton from a yaml definition -- export updated yaml from code after changes -- web UI for documentation -- templating/references -- auto-generate client (in golang) - -also look at ReDoc as a UI; all in-brower generated from JSON (react) - -## Non-API stuff - -- logrus structured logging (or zap?) -- testify tests (and assert?) -- viper config diff --git a/notes/ideas/bot_tools.txt b/notes/ideas/bot_tools.txt new file mode 100644 index 00000000..cf465bde --- /dev/null +++ b/notes/ideas/bot_tools.txt @@ -0,0 +1,17 @@ + +Could be helpful for writing bots for import: + +metafacture: large/popular java framework for pipelines and munging library +metadata. + + https://github.com/metafacture/metafacture-core/wiki + +catmandu: large/popular set of perl libraries for munging bibliographic +metadata, including a DSL ("Fix"). Can also push/pull to backends. + +miku/siskin: luigi and higher-level tool for running regular tasks. + + https://github.com/miku/span + +miku/span: golang lower-level tools for parsing and normalizing specific +formats (including KBART, DOAJ). diff --git a/notes/ideas/domains.txt b/notes/ideas/domains.txt new file mode 100644 index 00000000..8556494e --- /dev/null +++ b/notes/ideas/domains.txt @@ -0,0 +1,5 @@ + +Many obvious domains and hacks are taken. Would love to get fatcat.org; for now +registered fatcat.wiki. + +fatca.tt is available. diff --git a/notes/ideas/more_api_patterns.txt b/notes/ideas/more_api_patterns.txt new file mode 100644 index 00000000..ca61ac81 --- /dev/null +++ b/notes/ideas/more_api_patterns.txt @@ -0,0 +1,15 @@ + +If returning a long list (eg, all releases for a container): + + "releases": { + "data": [ + , + , + ... + ], + "has_mode": true, + "total_count": 100, + "url": "/v0/container/asdf/releases" + } + +This pattern from the Stripe API. diff --git a/notes/ideas/thoughts.txt b/notes/ideas/thoughts.txt new file mode 100644 index 00000000..c01c0d37 --- /dev/null +++ b/notes/ideas/thoughts.txt @@ -0,0 +1,32 @@ + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + +## Scaling database + +Two scaling issues: size of database due to edits (likely billions of rows) and +desire to do complex queries/reports ("analytics"). The later is probably not a +concern, and could be handled by dumping and working on a cluster (or secondary +views, etc). So just a distraction? Simpler to have all rolled up. + +Cockroach is postgres-like; might be able to use that for HA and scaling? +Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk +import performance (one-time?). + +Using elastic for most (eg, non-logged-in) views could keep things fast. + +Cockroach seems more resourced/polished than TiDB? diff --git a/notes/more_api_patterns.txt b/notes/more_api_patterns.txt deleted file mode 100644 index ca61ac81..00000000 --- a/notes/more_api_patterns.txt +++ /dev/null @@ -1,15 +0,0 @@ - -If returning a long list (eg, all releases for a container): - - "releases": { - "data": [ - , - , - ... - ], - "has_mode": true, - "total_count": 100, - "url": "/v0/container/asdf/releases" - } - -This pattern from the Stripe API. diff --git a/notes/thoughts.txt b/notes/thoughts.txt deleted file mode 100644 index c01c0d37..00000000 --- a/notes/thoughts.txt +++ /dev/null @@ -1,32 +0,0 @@ - -Instead of having a separate id pointer table, could have an extra "mutable" -public ID column (unique, indexed) on entity rows. Backend would ensure the -right thing happens. Changelog tables (or special redirect/deletion tables) -would record changes and be "fallen through" to. - -Instead of having merge redirects, could just point all identifiers to the same -revision (and update them all in the future). Don't need to recurse! Need to -keep this forever though, could scale badly if "aggregations" get merged. - -Redirections of redirections should probably simply be disallowed. - -"Deletion" is really just pointing to a special or null entity. - -Trade-off: easy querying for common case (wanting "active" rows) vs. robust -handling of redirects (likely to be pretty common). Also, having UUID handling -across more than one table. - -## Scaling database - -Two scaling issues: size of database due to edits (likely billions of rows) and -desire to do complex queries/reports ("analytics"). The later is probably not a -concern, and could be handled by dumping and working on a cluster (or secondary -views, etc). So just a distraction? Simpler to have all rolled up. - -Cockroach is postgres-like; might be able to use that for HA and scaling? -Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk -import performance (one-time?). - -Using elastic for most (eg, non-logged-in) views could keep things fast. - -Cockroach seems more resourced/polished than TiDB? -- cgit v1.2.3