diff options
| -rw-r--r-- | TODO | 120 | ||||
| -rw-r--r-- | python/TODO | 9 | ||||
| -rw-r--r-- | rust/TODO | 47 | 
3 files changed, 86 insertions, 90 deletions
| @@ -1,29 +1,31 @@  ## Next Up -- some significant slow-down has happened? transactions, or regexes? -summer roadmap: -- PUT/UPDATE, DELETE, and merge code paths -- faster UPDATE-free bulk import code path -- container import (extra?): lang, region, subject -- basic API+webface creation, editing, merging, editgroup approval +- basic webface creation, editing, merging, editgroup approval  - elastic schema/transform for releases; bulk and continuous scripts -features: -- fast database dump command: both changelog-based and entity-based (rust) -    => lighter, more complete dumps for each entity type? -- guide skeleton (mdbook; guide.fatcat.wiki) +## QA Blockers + +- refactors and correctness in rust/TODO +- importers have editor accounts and include editgroup metadata +- crossref importer uses extids + +## Production blockers + +- enforce single-ident-edit-per-editgroup +    => entity_edit: entity_ident/entity_editgroup should be UNIQ index +    => UPDATE/REPLACE edits? +- crossref importer sets release_type as "stub" when appropriate +- re-implement old python tests +- real auth +- metrics, jwt, config, sentry + +## Metadata Import -importers: -- CORE -- wikidata cross-ref (if they have a dump)  - manifest: multiple URLs per SHA1 -- pubmed (medline), if not in CORE -    => and/or, use pubmed ID lookups on crossref import -- core -- semantic scholar (up to 39 million; author de-dupe) -- wikidata (if they have a dump)  - crossref: relations ("is-preprint-of") +- crossref: two phse: no citations, then matched citations (via DOI table) +- container import (extra?): lang, region, subject  - crossref: filter works      => content-type whitelist      => title length and title/slug blacklist @@ -31,61 +33,43 @@ importers:      => make this a method on Release object      => or just set release_stub as "stub"? -bugs: +new importers: +- pubmed (medline) (filtered) +    => and/or, use pubmed ID lookups on crossref import +- CORE (filtered) +- semantic scholar (up to 39 million; author de-dupe) + +## Entity/Edit Lifecycle + +- redirects and merges (API, webface, etc)  - test: release pointing to a collection that has been deleted/redirected    => UI crash? +- commenting and accepting editgroups +- editgroup state machine? +- enforce "single ident edit per editgroup" +    => how to "edit an edit"? clobber existing? -july roadmap: -- complete and test this round of schema changes -- container import (extra?): lang, region, subject -- re-run imports -- basic API+webface creation, editing, merging, editgroup approval -- elastic schema/transform for releases; bulk and continuous scripts - -## Schema / Alignment / Scope +## Guide / Book / Style -- "container" -> "venue"? -- release_type, release_status, url.rel write-time schema(and others?) +- release_type, release_status, url.rel schemas (and enforce in API?)  name ref: https://www.w3.org/International/questions/qa-personal-names -## API - -- how to send edit "extra" metadata? -- hydrate entities in API -    ? "expand" query param - -## High-Level Priorities - -- full database dump (export) -- manual editing of containers and releases (web interface) - -## Web UI - -- changelog more like a https://semantic-ui.com/views/feed.html ? -- instead of grid, maybe https://semantic-ui.com/elements/rail.html +## Fun Features -## Performance - -- write pure-rust "benchmark" scripts that hit, eg, lookups and batch -  endpoints. run these with auto_explain on, then look in logs on dev machine -- batch inserts automerge: create editgroup and changelog, mark all edits as -  accepted, all in a single transaction - -## API - -- hydrate entities in API -    ? "expand" query param -- don't include abstracts by default? -- "stub" mode for lookups, returning only the ident (or maybe whole row)? - -## Database - -- test using hash indexes for some UUID column indexes, or at least sha1 and -  other hashes (abstracts, file lookups) +- "save paper now" +    => is it in GWB? if not, SPN +    => get hash + url from GWB, verify mimetype acceptable +    => is file in fatcat? +    => what about HBase? GROBID? +    => create edit, redirect user to editgroup submit page +- python client tool and library in pypi +    => or maybe rust? +- bibtext (etc) export  ## Other +- consider using "HTTP 202: Accepted" for entity-mutating calls  - basic python hbase/elastic matcher    => takes sha1 keys    => checks fatcat API + hbase @@ -94,19 +78,11 @@ name ref: https://www.w3.org/International/questions/qa-personal-names    => proof-of-concept, no tests  - add_header Strict-Transport-Security "max-age=3600";      => 12 hours? 24? -- criterion.rs benchmarking -- schema.org metadata in webface -- bulk endpoint auto-merge mode (huge postgres speedup on import)  - elastic pipeline  - kong or oauth2_proxy for auth, rate-limit, etc +- feature flags: consul? +- secrets: vault?  - "authn" microservice: https://keratin.tech/ -- PUT for mid-edit revisions -- 'parent rev' for revisions (vs. container parent) -- "submit" status for editgroups? - -review -- what does openlibrary API look like? -x add a 'live' (or 'immutable') flag to revision tables  better API docs  - https://sourcey.com/spectacle/ diff --git a/python/TODO b/python/TODO index 3e8ba6ff..708b8aa8 100644 --- a/python/TODO +++ b/python/TODO @@ -1,7 +1,7 @@ -- make debugbar really optional (don't import unless we're in debug mode) +- schema.org metadata for releases -tests +additional tests  - full object fields actually getting passed e2e (for rich_app)  - implicit editor.active_edit_group behavior  - modify existing release via edit mechanism (and commit) @@ -13,3 +13,8 @@ tests  views  - oldest un-merged edits/edit-groups +- changelog more like a https://semantic-ui.com/views/feed.html ? +- instead of grid, maybe https://semantic-ui.com/elements/rail.html + +backlog +- make debugbar really optional (don't import unless we're in debug mode) @@ -1,24 +1,40 @@ -verbs: +refactors +- fatcatd -> fatcat-api-server +- fatcat_api -> fatcat_api_schema (or spec? models? types?) +- standardize "mutating"/"edit" actions +    => have editgroup_id be a request-level param everywhere (not entity-level; +       for batch) +    => editgroup_id as query param +    => editor_id from auth (header) +- consistent "expand"/"stub" flags + +correctness  - enforce "previous_rev" required in updates +- reread/review editgroup accept code +- enforce "no editing if editgroup accepted" behavior +- changelog sequence without gaps +- batch insert editgroup behavior; always a new editgroup? + +edit lifecycle +- editgroup: state to track review status? +- per-edit extra JSON + +account helper tool +- set admin bit +- create editors +- create keypairs +- generate tokens +- test/validate tokens -- review editgroup accept code (?) -- fatcat_api -> fatcat_api_schema (or spec? models? types?) -- generally, standardize "edit" actions -- fatcat -> fatcat-api-server -- editgroup param to update -    => also for creation? for consistency -- editor_id vs. editor username; return editor_id (in addition to name?)  later: -- have editgroup_id be a request-level param everywhere (not entity-level; for batch) -- editgroup: state to track review status? -- re-implement old python tests -- enforce "no editing if editgroup accepted" behavior -- real auth -- metrics, jwt, config, sentry -- ansible/deployment/DNS story +- pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints +    => criterion.rs benchmarking +- try new actix/openapi3 codegen branch  - refactor logging; use slog +- test using hash indexes for some UUID column indexes, or at least sha1 and +  other hashes (abstracts, file lookups)  schema/api questions:  - url table (for files) @@ -26,4 +42,3 @@ schema/api questions:  - "types"  - define release field stuff  - what should entity POST return? include both the entity and the edit? - | 
