aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-09-11 13:56:53 -0700
committerBryan Newbold <bnewbold@robocracy.org>2018-09-11 13:56:53 -0700
commit0dc872921023030f6ffd320eb038e5379b47fa53 (patch)
tree7069b0a3c914431d4e0e7f05d2526592b0c4e3cf
parent98f21fe69e0361db00e5fbceb7a3168dcb926d32 (diff)
downloadfatcat-0dc872921023030f6ffd320eb038e5379b47fa53.tar.gz
fatcat-0dc872921023030f6ffd320eb038e5379b47fa53.zip
update TODO lists (september plan)
-rw-r--r--TODO120
-rw-r--r--python/TODO9
-rw-r--r--rust/TODO47
3 files changed, 86 insertions, 90 deletions
diff --git a/TODO b/TODO
index 765f6a3a..900e8eda 100644
--- a/TODO
+++ b/TODO
@@ -1,29 +1,31 @@
## Next Up
-- some significant slow-down has happened? transactions, or regexes?
-summer roadmap:
-- PUT/UPDATE, DELETE, and merge code paths
-- faster UPDATE-free bulk import code path
-- container import (extra?): lang, region, subject
-- basic API+webface creation, editing, merging, editgroup approval
+- basic webface creation, editing, merging, editgroup approval
- elastic schema/transform for releases; bulk and continuous scripts
-features:
-- fast database dump command: both changelog-based and entity-based (rust)
- => lighter, more complete dumps for each entity type?
-- guide skeleton (mdbook; guide.fatcat.wiki)
+## QA Blockers
+
+- refactors and correctness in rust/TODO
+- importers have editor accounts and include editgroup metadata
+- crossref importer uses extids
+
+## Production blockers
+
+- enforce single-ident-edit-per-editgroup
+ => entity_edit: entity_ident/entity_editgroup should be UNIQ index
+ => UPDATE/REPLACE edits?
+- crossref importer sets release_type as "stub" when appropriate
+- re-implement old python tests
+- real auth
+- metrics, jwt, config, sentry
+
+## Metadata Import
-importers:
-- CORE
-- wikidata cross-ref (if they have a dump)
- manifest: multiple URLs per SHA1
-- pubmed (medline), if not in CORE
- => and/or, use pubmed ID lookups on crossref import
-- core
-- semantic scholar (up to 39 million; author de-dupe)
-- wikidata (if they have a dump)
- crossref: relations ("is-preprint-of")
+- crossref: two phse: no citations, then matched citations (via DOI table)
+- container import (extra?): lang, region, subject
- crossref: filter works
=> content-type whitelist
=> title length and title/slug blacklist
@@ -31,61 +33,43 @@ importers:
=> make this a method on Release object
=> or just set release_stub as "stub"?
-bugs:
+new importers:
+- pubmed (medline) (filtered)
+ => and/or, use pubmed ID lookups on crossref import
+- CORE (filtered)
+- semantic scholar (up to 39 million; author de-dupe)
+
+## Entity/Edit Lifecycle
+
+- redirects and merges (API, webface, etc)
- test: release pointing to a collection that has been deleted/redirected
=> UI crash?
+- commenting and accepting editgroups
+- editgroup state machine?
+- enforce "single ident edit per editgroup"
+ => how to "edit an edit"? clobber existing?
-july roadmap:
-- complete and test this round of schema changes
-- container import (extra?): lang, region, subject
-- re-run imports
-- basic API+webface creation, editing, merging, editgroup approval
-- elastic schema/transform for releases; bulk and continuous scripts
-
-## Schema / Alignment / Scope
+## Guide / Book / Style
-- "container" -> "venue"?
-- release_type, release_status, url.rel write-time schema(and others?)
+- release_type, release_status, url.rel schemas (and enforce in API?)
name ref: https://www.w3.org/International/questions/qa-personal-names
-## API
-
-- how to send edit "extra" metadata?
-- hydrate entities in API
- ? "expand" query param
-
-## High-Level Priorities
-
-- full database dump (export)
-- manual editing of containers and releases (web interface)
-
-## Web UI
-
-- changelog more like a https://semantic-ui.com/views/feed.html ?
-- instead of grid, maybe https://semantic-ui.com/elements/rail.html
+## Fun Features
-## Performance
-
-- write pure-rust "benchmark" scripts that hit, eg, lookups and batch
- endpoints. run these with auto_explain on, then look in logs on dev machine
-- batch inserts automerge: create editgroup and changelog, mark all edits as
- accepted, all in a single transaction
-
-## API
-
-- hydrate entities in API
- ? "expand" query param
-- don't include abstracts by default?
-- "stub" mode for lookups, returning only the ident (or maybe whole row)?
-
-## Database
-
-- test using hash indexes for some UUID column indexes, or at least sha1 and
- other hashes (abstracts, file lookups)
+- "save paper now"
+ => is it in GWB? if not, SPN
+ => get hash + url from GWB, verify mimetype acceptable
+ => is file in fatcat?
+ => what about HBase? GROBID?
+ => create edit, redirect user to editgroup submit page
+- python client tool and library in pypi
+ => or maybe rust?
+- bibtext (etc) export
## Other
+- consider using "HTTP 202: Accepted" for entity-mutating calls
- basic python hbase/elastic matcher
=> takes sha1 keys
=> checks fatcat API + hbase
@@ -94,19 +78,11 @@ name ref: https://www.w3.org/International/questions/qa-personal-names
=> proof-of-concept, no tests
- add_header Strict-Transport-Security "max-age=3600";
=> 12 hours? 24?
-- criterion.rs benchmarking
-- schema.org metadata in webface
-- bulk endpoint auto-merge mode (huge postgres speedup on import)
- elastic pipeline
- kong or oauth2_proxy for auth, rate-limit, etc
+- feature flags: consul?
+- secrets: vault?
- "authn" microservice: https://keratin.tech/
-- PUT for mid-edit revisions
-- 'parent rev' for revisions (vs. container parent)
-- "submit" status for editgroups?
-
-review
-- what does openlibrary API look like?
-x add a 'live' (or 'immutable') flag to revision tables
better API docs
- https://sourcey.com/spectacle/
diff --git a/python/TODO b/python/TODO
index 3e8ba6ff..708b8aa8 100644
--- a/python/TODO
+++ b/python/TODO
@@ -1,7 +1,7 @@
-- make debugbar really optional (don't import unless we're in debug mode)
+- schema.org metadata for releases
-tests
+additional tests
- full object fields actually getting passed e2e (for rich_app)
- implicit editor.active_edit_group behavior
- modify existing release via edit mechanism (and commit)
@@ -13,3 +13,8 @@ tests
views
- oldest un-merged edits/edit-groups
+- changelog more like a https://semantic-ui.com/views/feed.html ?
+- instead of grid, maybe https://semantic-ui.com/elements/rail.html
+
+backlog
+- make debugbar really optional (don't import unless we're in debug mode)
diff --git a/rust/TODO b/rust/TODO
index ac378961..c922d5df 100644
--- a/rust/TODO
+++ b/rust/TODO
@@ -1,24 +1,40 @@
-verbs:
+refactors
+- fatcatd -> fatcat-api-server
+- fatcat_api -> fatcat_api_schema (or spec? models? types?)
+- standardize "mutating"/"edit" actions
+ => have editgroup_id be a request-level param everywhere (not entity-level;
+ for batch)
+ => editgroup_id as query param
+ => editor_id from auth (header)
+- consistent "expand"/"stub" flags
+
+correctness
- enforce "previous_rev" required in updates
+- reread/review editgroup accept code
+- enforce "no editing if editgroup accepted" behavior
+- changelog sequence without gaps
+- batch insert editgroup behavior; always a new editgroup?
+
+edit lifecycle
+- editgroup: state to track review status?
+- per-edit extra JSON
+
+account helper tool
+- set admin bit
+- create editors
+- create keypairs
+- generate tokens
+- test/validate tokens
-- review editgroup accept code (?)
-- fatcat_api -> fatcat_api_schema (or spec? models? types?)
-- generally, standardize "edit" actions
-- fatcat -> fatcat-api-server
-- editgroup param to update
- => also for creation? for consistency
-- editor_id vs. editor username; return editor_id (in addition to name?)
later:
-- have editgroup_id be a request-level param everywhere (not entity-level; for batch)
-- editgroup: state to track review status?
-- re-implement old python tests
-- enforce "no editing if editgroup accepted" behavior
-- real auth
-- metrics, jwt, config, sentry
-- ansible/deployment/DNS story
+- pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints
+ => criterion.rs benchmarking
+- try new actix/openapi3 codegen branch
- refactor logging; use slog
+- test using hash indexes for some UUID column indexes, or at least sha1 and
+ other hashes (abstracts, file lookups)
schema/api questions:
- url table (for files)
@@ -26,4 +42,3 @@ schema/api questions:
- "types"
- define release field stuff
- what should entity POST return? include both the entity and the edit?
-