From 0dc872921023030f6ffd320eb038e5379b47fa53 Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@robocracy.org>
Date: Tue, 11 Sep 2018 13:56:53 -0700
Subject: update TODO lists (september plan)

---
 TODO        | 120 ++++++++++++++++++++++++------------------------------------
 python/TODO |   9 ++++-
 rust/TODO   |  47 ++++++++++++++++--------
 3 files changed, 86 insertions(+), 90 deletions(-)

diff --git a/TODO b/TODO
index 765f6a3a..900e8eda 100644
--- a/TODO
+++ b/TODO
@@ -1,29 +1,31 @@
 
 ## Next Up
 
-- some significant slow-down has happened? transactions, or regexes?
-summer roadmap:
-- PUT/UPDATE, DELETE, and merge code paths
-- faster UPDATE-free bulk import code path
-- container import (extra?): lang, region, subject
-- basic API+webface creation, editing, merging, editgroup approval
+- basic webface creation, editing, merging, editgroup approval
 - elastic schema/transform for releases; bulk and continuous scripts
 
-features:
-- fast database dump command: both changelog-based and entity-based (rust)
-    => lighter, more complete dumps for each entity type?
-- guide skeleton (mdbook; guide.fatcat.wiki)
+## QA Blockers
+
+- refactors and correctness in rust/TODO
+- importers have editor accounts and include editgroup metadata
+- crossref importer uses extids
+
+## Production blockers
+
+- enforce single-ident-edit-per-editgroup
+    => entity_edit: entity_ident/entity_editgroup should be UNIQ index
+    => UPDATE/REPLACE edits?
+- crossref importer sets release_type as "stub" when appropriate
+- re-implement old python tests
+- real auth
+- metrics, jwt, config, sentry
+
+## Metadata Import
 
-importers:
-- CORE
-- wikidata cross-ref (if they have a dump)
 - manifest: multiple URLs per SHA1
-- pubmed (medline), if not in CORE
-    => and/or, use pubmed ID lookups on crossref import
-- core
-- semantic scholar (up to 39 million; author de-dupe)
-- wikidata (if they have a dump)
 - crossref: relations ("is-preprint-of")
+- crossref: two phse: no citations, then matched citations (via DOI table)
+- container import (extra?): lang, region, subject
 - crossref: filter works
     => content-type whitelist
     => title length and title/slug blacklist
@@ -31,61 +33,43 @@ importers:
     => make this a method on Release object
     => or just set release_stub as "stub"?
 
-bugs:
+new importers:
+- pubmed (medline) (filtered)
+    => and/or, use pubmed ID lookups on crossref import
+- CORE (filtered)
+- semantic scholar (up to 39 million; author de-dupe)
+
+## Entity/Edit Lifecycle
+
+- redirects and merges (API, webface, etc)
 - test: release pointing to a collection that has been deleted/redirected
   => UI crash?
+- commenting and accepting editgroups
+- editgroup state machine?
+- enforce "single ident edit per editgroup"
+    => how to "edit an edit"? clobber existing?
 
-july roadmap:
-- complete and test this round of schema changes
-- container import (extra?): lang, region, subject
-- re-run imports
-- basic API+webface creation, editing, merging, editgroup approval
-- elastic schema/transform for releases; bulk and continuous scripts
-
-## Schema / Alignment / Scope
+## Guide / Book / Style
 
-- "container" -> "venue"?
-- release_type, release_status, url.rel write-time schema(and others?)
+- release_type, release_status, url.rel schemas (and enforce in API?)
 
 name ref: https://www.w3.org/International/questions/qa-personal-names
 
-## API
-
-- how to send edit "extra" metadata?
-- hydrate entities in API
-    ? "expand" query param
-
-## High-Level Priorities
-
-- full database dump (export)
-- manual editing of containers and releases (web interface)
-
-## Web UI
-
-- changelog more like a https://semantic-ui.com/views/feed.html ?
-- instead of grid, maybe https://semantic-ui.com/elements/rail.html
+## Fun Features
 
-## Performance
-
-- write pure-rust "benchmark" scripts that hit, eg, lookups and batch
-  endpoints. run these with auto_explain on, then look in logs on dev machine
-- batch inserts automerge: create editgroup and changelog, mark all edits as
-  accepted, all in a single transaction
-
-## API
-
-- hydrate entities in API
-    ? "expand" query param
-- don't include abstracts by default?
-- "stub" mode for lookups, returning only the ident (or maybe whole row)?
-
-## Database
-
-- test using hash indexes for some UUID column indexes, or at least sha1 and
-  other hashes (abstracts, file lookups)
+- "save paper now"
+    => is it in GWB? if not, SPN
+    => get hash + url from GWB, verify mimetype acceptable
+    => is file in fatcat?
+    => what about HBase? GROBID?
+    => create edit, redirect user to editgroup submit page
+- python client tool and library in pypi
+    => or maybe rust?
+- bibtext (etc) export
 
 ## Other
 
+- consider using "HTTP 202: Accepted" for entity-mutating calls
 - basic python hbase/elastic matcher
   => takes sha1 keys
   => checks fatcat API + hbase
@@ -94,19 +78,11 @@ name ref: https://www.w3.org/International/questions/qa-personal-names
   => proof-of-concept, no tests
 - add_header Strict-Transport-Security "max-age=3600";
     => 12 hours? 24?
-- criterion.rs benchmarking
-- schema.org metadata in webface
-- bulk endpoint auto-merge mode (huge postgres speedup on import)
 - elastic pipeline
 - kong or oauth2_proxy for auth, rate-limit, etc
+- feature flags: consul?
+- secrets: vault?
 - "authn" microservice: https://keratin.tech/
-- PUT for mid-edit revisions
-- 'parent rev' for revisions (vs. container parent)
-- "submit" status for editgroups?
-
-review
-- what does openlibrary API look like?
-x add a 'live' (or 'immutable') flag to revision tables
 
 better API docs
 - https://sourcey.com/spectacle/
diff --git a/python/TODO b/python/TODO
index 3e8ba6ff..708b8aa8 100644
--- a/python/TODO
+++ b/python/TODO
@@ -1,7 +1,7 @@
 
-- make debugbar really optional (don't import unless we're in debug mode)
+- schema.org metadata for releases
 
-tests
+additional tests
 - full object fields actually getting passed e2e (for rich_app)
 - implicit editor.active_edit_group behavior
 - modify existing release via edit mechanism (and commit)
@@ -13,3 +13,8 @@ tests
 
 views
 - oldest un-merged edits/edit-groups
+- changelog more like a https://semantic-ui.com/views/feed.html ?
+- instead of grid, maybe https://semantic-ui.com/elements/rail.html
+
+backlog
+- make debugbar really optional (don't import unless we're in debug mode)
diff --git a/rust/TODO b/rust/TODO
index ac378961..c922d5df 100644
--- a/rust/TODO
+++ b/rust/TODO
@@ -1,24 +1,40 @@
 
-verbs:
+refactors
+- fatcatd -> fatcat-api-server
+- fatcat_api -> fatcat_api_schema (or spec? models? types?)
+- standardize "mutating"/"edit" actions
+    => have editgroup_id be a request-level param everywhere (not entity-level;
+       for batch)
+    => editgroup_id as query param
+    => editor_id from auth (header)
+- consistent "expand"/"stub" flags
+
+correctness
 - enforce "previous_rev" required in updates
+- reread/review editgroup accept code
+- enforce "no editing if editgroup accepted" behavior
+- changelog sequence without gaps
+- batch insert editgroup behavior; always a new editgroup?
+
+edit lifecycle
+- editgroup: state to track review status?
+- per-edit extra JSON
+
+account helper tool
+- set admin bit
+- create editors
+- create keypairs
+- generate tokens
+- test/validate tokens
 
-- review editgroup accept code (?)
-- fatcat_api -> fatcat_api_schema (or spec? models? types?)
-- generally, standardize "edit" actions
-- fatcat -> fatcat-api-server
-- editgroup param to update
-    => also for creation? for consistency
-- editor_id vs. editor username; return editor_id (in addition to name?)
 
 later:
-- have editgroup_id be a request-level param everywhere (not entity-level; for batch)
-- editgroup: state to track review status?
-- re-implement old python tests
-- enforce "no editing if editgroup accepted" behavior
-- real auth
-- metrics, jwt, config, sentry
-- ansible/deployment/DNS story
+- pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints
+    => criterion.rs benchmarking
+- try new actix/openapi3 codegen branch
 - refactor logging; use slog
+- test using hash indexes for some UUID column indexes, or at least sha1 and
+  other hashes (abstracts, file lookups)
 
 schema/api questions:
 - url table (for files)
@@ -26,4 +42,3 @@ schema/api questions:
 - "types"
 - define release field stuff
 - what should entity POST return? include both the entity and the edit?
-
-- 
cgit v1.2.3