update README and TODOs a bit

author: Bryan Newbold <bnewbold@robocracy.org> 2018-11-07 11:36:30 -0800
committer: Bryan Newbold <bnewbold@robocracy.org> 2018-11-07 11:36:30 -0800
commit: 0565516ce64297cf83f4cab23454f017c0fb3515 (patch)
tree: ddfe861fea166ffbb6181a855a3c54534c9e5fcb
parent: 6d0d172d7e680731cbaf24f625e2cd19e79c9ed1 (diff)
download: fatcat-0565516ce64297cf83f4cab23454f017c0fb3515.tar.gz
fatcat-0565516ce64297cf83f4cab23454f017c0fb3515.zip
3 files changed, 29 insertions, 40 deletions
diff --git a/README.md b/README.md
index 3ef66edf..7355e626 100644
--- a/README.md
+++ b/README.md
@@ -8,23 +8,20 @@
                                         ... catalog all the things!
 
 
+This repository contains source code for 'fatcat', an editable catalog of
+published written works (mostly journal articles), with a focus on tracking
+the location and status of full-text copies to ensure "perpetual access".
+
 The [RFC](./fatcat-rfc.md) is the original design document, and the best place
 to start for background. There is a work-in-progress "guide" at
 <https://guide.fatcat.wiki>; the canonical public location of this repository
 is <https://github.com/internetarchive/fatcat>.
 
-There are four main components:
-
-- backend API server and database
-- elasticsearch index
-- API client libraries and bots (eg, ingesters)
-- front-end web interface (built on API and library)
+There are three main components:
 
-The API server was prototyped in python. "Real" implementation started in
-golang, but shifted to Rust, and is work-in-progress. The beginings of a client
-library, web interface, and data ingesters exist in python. Elasticsearch index
-is currently just a Crossref metadata dump and doesn't match entities in the
-database/API (but is useful for paper lookups).
+- backend API server and database (in Rust)
+- API client libraries and bots (in Python)
+- front-end web interface (in Python; built on API and library)
 
 See the LICENSE file for details permissions and licensing of both python and
 rust code. In short, the auto-generated client libraries are permissively
@@ -32,26 +29,28 @@ released, while the API server and web interface are strong copyleft (AGPLv3).
 
 ## Status
 
-- HTTP API
-    - [x] base32 encoding of UUID identifiers
-    - [x] inverse many-to-many helpers (files-by-release, release-by-creator)
-- SQL Schema
+- SQL and HTTP API schemas
     - [x] Basic entities
     - [x] one-to-many and many-to-many entities
     - [x] JSON(B) "extra" metadata fields
     - [x] full rev1 schema for all entities
     - [ ] editgroup review: comments? actions?
+    - [ ] file sets and web captures
+- HTTP API Server
+    - [x] base32 encoding of UUID identifiers
+    - [x] inverse many-to-many helpers (files-by-release, release-by-creator)
+    - [ ] Authentication (eg, accounts, OAuth2, JWT)
+    - [ ] Authorization (aka, roles)
 - Web Interface
     - [x] Migrate Python codebase
     - [ ] Creation and editing of all entities
 - Other
+    - [x] Elasticsearch schema
     - [x] Basic logging
     - [x] Swagger-UI 
+    - [x] Bulk metadata exports
     - [ ] Sentry (error reporting)
     - [ ] Metrics
-    - [ ] Authentication (eg, accounts, OAuth2, JWT)
-    - [ ] Authorization (aka, roles)
-    - [ ] bot vs. editor
 
 ## Identifiers
 
diff --git a/TODO b/TODO
index 506c2d2a..c09764d3 100644
--- a/TODO
+++ b/TODO
@@ -2,28 +2,24 @@
 ## Next Up
 
 - basic webface creation, editing, merging, editgroup approval
-- elastic schema/transform for releases; bulk and continuous scripts
 
-## QA Blockers
+## Production blockers
 
 - refactors and correctness in rust/TODO
 - importers have editor accounts and include editgroup metadata
-
-## Production blockers
-
 - enforce single-ident-edit-per-editgroup
     => entity_edit: entity_ident/entity_editgroup should be UNIQ index
     => UPDATE/REPLACE edits?
 - crossref importer sets release_type as "stub" when appropriate
 - re-implement old python tests
-- real auth
+- real authentication and authorization
 - metrics, jwt, config, sentry
 
 ## Metadata Import
 
 - manifest: multiple URLs per SHA1
 - crossref: relations ("is-preprint-of")
-- crossref: two phse: no citations, then matched citations (via DOI table)
+- crossref: two phase: no citations, then matched citations (via DOI table)
 - container import (extra?): lang, region, subject
 - crossref: filter works
     => content-type whitelist
@@ -35,8 +31,10 @@
 new importers:
 - pubmed (medline) (filtered)
     => and/or, use pubmed ID lookups on crossref import
+- arxiv.org
+- DOAJ
 - CORE (filtered)
-- semantic scholar (up to 39 million; author de-dupe)
+- semantic scholar (up to 39 million; includes author de-dupe)
 
 ## Entity/Edit Lifecycle
 
@@ -50,7 +48,7 @@ new importers:
 
 ## Guide / Book / Style
 
-- release_type, release_status, url.rel schemas (and enforce in API?)
+- release_type, release_status, url.rel schemas (enforced in API)
 - more+better terms+policies: https://tosdr.org/index.html
 
 ## Fun Features
@@ -67,12 +65,15 @@ new importers:
 
 ## Schema / Entity Fields
 
+- FileSet and WebSnapshot entities
 - `doi` field for containers (at least for "journal" type; maybe for "series"
   as well?)
 - `retracted`, `translation`, and perhaps `corrected` as flags on releases,
   instead of release_status?
+- 'part-of' relation for releases (release to release) and possibly containers
+- `container-type` field for containers (journal, conference, book series, etc)
 
-## Other
+## Other / Backburner
 
 - refactor openapi schema to use shared response types
 - consider using "HTTP 202: Accepted" for entity-mutating calls
@@ -84,8 +85,7 @@ new importers:
   => proof-of-concept, no tests
 - add_header Strict-Transport-Security "max-age=3600";
     => 12 hours? 24?
-- elastic pipeline
-- kong or oauth2_proxy for auth, rate-limit, etc
+- haproxy for rate-limiting
 - feature flags: consul?
 - secrets: vault?
 - "authn" microservice: https://keratin.tech/
diff --git a/guide/TODO b/guide/TODO
deleted file mode 100644
index 1c9b7110..00000000
--- a/guide/TODO
+++ /dev/null
@@ -1,10 +0,0 @@
-- scope
-
-- quick passes: spellcheck, " I ", "would/will"
-
-TODO
-- roadmap
-- revise 'implementation' page with details (hosting costs, etc)
-
-DONE
-- policies
author	Bryan Newbold <bnewbold@robocracy.org>	2018-11-07 11:36:30 -0800
committer	Bryan Newbold <bnewbold@robocracy.org>	2018-11-07 11:36:30 -0800
commit	0565516ce64297cf83f4cab23454f017c0fb3515 (patch)
tree	ddfe861fea166ffbb6181a855a3c54534c9e5fcb
parent	6d0d172d7e680731cbaf24f625e2cd19e79c9ed1 (diff)
download	fatcat-0565516ce64297cf83f4cab23454f017c0fb3515.tar.gz fatcat-0565516ce64297cf83f4cab23454f017c0fb3515.zip