From 0dc872921023030f6ffd320eb038e5379b47fa53 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Tue, 11 Sep 2018 13:56:53 -0700 Subject: update TODO lists (september plan) --- TODO | 120 ++++++++++++++++++++++++------------------------------------ python/TODO | 9 ++++- rust/TODO | 47 ++++++++++++++++-------- 3 files changed, 86 insertions(+), 90 deletions(-) diff --git a/TODO b/TODO index 765f6a3a..900e8eda 100644 --- a/TODO +++ b/TODO @@ -1,29 +1,31 @@ ## Next Up -- some significant slow-down has happened? transactions, or regexes? -summer roadmap: -- PUT/UPDATE, DELETE, and merge code paths -- faster UPDATE-free bulk import code path -- container import (extra?): lang, region, subject -- basic API+webface creation, editing, merging, editgroup approval +- basic webface creation, editing, merging, editgroup approval - elastic schema/transform for releases; bulk and continuous scripts -features: -- fast database dump command: both changelog-based and entity-based (rust) - => lighter, more complete dumps for each entity type? -- guide skeleton (mdbook; guide.fatcat.wiki) +## QA Blockers + +- refactors and correctness in rust/TODO +- importers have editor accounts and include editgroup metadata +- crossref importer uses extids + +## Production blockers + +- enforce single-ident-edit-per-editgroup + => entity_edit: entity_ident/entity_editgroup should be UNIQ index + => UPDATE/REPLACE edits? +- crossref importer sets release_type as "stub" when appropriate +- re-implement old python tests +- real auth +- metrics, jwt, config, sentry + +## Metadata Import -importers: -- CORE -- wikidata cross-ref (if they have a dump) - manifest: multiple URLs per SHA1 -- pubmed (medline), if not in CORE - => and/or, use pubmed ID lookups on crossref import -- core -- semantic scholar (up to 39 million; author de-dupe) -- wikidata (if they have a dump) - crossref: relations ("is-preprint-of") +- crossref: two phse: no citations, then matched citations (via DOI table) +- container import (extra?): lang, region, subject - crossref: filter works => content-type whitelist => title length and title/slug blacklist @@ -31,61 +33,43 @@ importers: => make this a method on Release object => or just set release_stub as "stub"? -bugs: +new importers: +- pubmed (medline) (filtered) + => and/or, use pubmed ID lookups on crossref import +- CORE (filtered) +- semantic scholar (up to 39 million; author de-dupe) + +## Entity/Edit Lifecycle + +- redirects and merges (API, webface, etc) - test: release pointing to a collection that has been deleted/redirected => UI crash? +- commenting and accepting editgroups +- editgroup state machine? +- enforce "single ident edit per editgroup" + => how to "edit an edit"? clobber existing? -july roadmap: -- complete and test this round of schema changes -- container import (extra?): lang, region, subject -- re-run imports -- basic API+webface creation, editing, merging, editgroup approval -- elastic schema/transform for releases; bulk and continuous scripts - -## Schema / Alignment / Scope +## Guide / Book / Style -- "container" -> "venue"? -- release_type, release_status, url.rel write-time schema(and others?) +- release_type, release_status, url.rel schemas (and enforce in API?) name ref: https://www.w3.org/International/questions/qa-personal-names -## API - -- how to send edit "extra" metadata? -- hydrate entities in API - ? "expand" query param - -## High-Level Priorities - -- full database dump (export) -- manual editing of containers and releases (web interface) - -## Web UI - -- changelog more like a https://semantic-ui.com/views/feed.html ? -- instead of grid, maybe https://semantic-ui.com/elements/rail.html +## Fun Features -## Performance - -- write pure-rust "benchmark" scripts that hit, eg, lookups and batch - endpoints. run these with auto_explain on, then look in logs on dev machine -- batch inserts automerge: create editgroup and changelog, mark all edits as - accepted, all in a single transaction - -## API - -- hydrate entities in API - ? "expand" query param -- don't include abstracts by default? -- "stub" mode for lookups, returning only the ident (or maybe whole row)? - -## Database - -- test using hash indexes for some UUID column indexes, or at least sha1 and - other hashes (abstracts, file lookups) +- "save paper now" + => is it in GWB? if not, SPN + => get hash + url from GWB, verify mimetype acceptable + => is file in fatcat? + => what about HBase? GROBID? + => create edit, redirect user to editgroup submit page +- python client tool and library in pypi + => or maybe rust? +- bibtext (etc) export ## Other +- consider using "HTTP 202: Accepted" for entity-mutating calls - basic python hbase/elastic matcher => takes sha1 keys => checks fatcat API + hbase @@ -94,19 +78,11 @@ name ref: https://www.w3.org/International/questions/qa-personal-names => proof-of-concept, no tests - add_header Strict-Transport-Security "max-age=3600"; => 12 hours? 24? -- criterion.rs benchmarking -- schema.org metadata in webface -- bulk endpoint auto-merge mode (huge postgres speedup on import) - elastic pipeline - kong or oauth2_proxy for auth, rate-limit, etc +- feature flags: consul? +- secrets: vault? - "authn" microservice: https://keratin.tech/ -- PUT for mid-edit revisions -- 'parent rev' for revisions (vs. container parent) -- "submit" status for editgroups? - -review -- what does openlibrary API look like? -x add a 'live' (or 'immutable') flag to revision tables better API docs - https://sourcey.com/spectacle/ diff --git a/python/TODO b/python/TODO index 3e8ba6ff..708b8aa8 100644 --- a/python/TODO +++ b/python/TODO @@ -1,7 +1,7 @@ -- make debugbar really optional (don't import unless we're in debug mode) +- schema.org metadata for releases -tests +additional tests - full object fields actually getting passed e2e (for rich_app) - implicit editor.active_edit_group behavior - modify existing release via edit mechanism (and commit) @@ -13,3 +13,8 @@ tests views - oldest un-merged edits/edit-groups +- changelog more like a https://semantic-ui.com/views/feed.html ? +- instead of grid, maybe https://semantic-ui.com/elements/rail.html + +backlog +- make debugbar really optional (don't import unless we're in debug mode) diff --git a/rust/TODO b/rust/TODO index ac378961..c922d5df 100644 --- a/rust/TODO +++ b/rust/TODO @@ -1,24 +1,40 @@ -verbs: +refactors +- fatcatd -> fatcat-api-server +- fatcat_api -> fatcat_api_schema (or spec? models? types?) +- standardize "mutating"/"edit" actions + => have editgroup_id be a request-level param everywhere (not entity-level; + for batch) + => editgroup_id as query param + => editor_id from auth (header) +- consistent "expand"/"stub" flags + +correctness - enforce "previous_rev" required in updates +- reread/review editgroup accept code +- enforce "no editing if editgroup accepted" behavior +- changelog sequence without gaps +- batch insert editgroup behavior; always a new editgroup? + +edit lifecycle +- editgroup: state to track review status? +- per-edit extra JSON + +account helper tool +- set admin bit +- create editors +- create keypairs +- generate tokens +- test/validate tokens -- review editgroup accept code (?) -- fatcat_api -> fatcat_api_schema (or spec? models? types?) -- generally, standardize "edit" actions -- fatcat -> fatcat-api-server -- editgroup param to update - => also for creation? for consistency -- editor_id vs. editor username; return editor_id (in addition to name?) later: -- have editgroup_id be a request-level param everywhere (not entity-level; for batch) -- editgroup: state to track review status? -- re-implement old python tests -- enforce "no editing if editgroup accepted" behavior -- real auth -- metrics, jwt, config, sentry -- ansible/deployment/DNS story +- pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints + => criterion.rs benchmarking +- try new actix/openapi3 codegen branch - refactor logging; use slog +- test using hash indexes for some UUID column indexes, or at least sha1 and + other hashes (abstracts, file lookups) schema/api questions: - url table (for files) @@ -26,4 +42,3 @@ schema/api questions: - "types" - define release field stuff - what should entity POST return? include both the entity and the edit? - -- cgit v1.2.3