summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--README.md35
-rw-r--r--TODO24
-rw-r--r--guide/TODO10
3 files changed, 29 insertions, 40 deletions
diff --git a/README.md b/README.md
index 3ef66edf..7355e626 100644
--- a/README.md
+++ b/README.md
@@ -8,23 +8,20 @@
... catalog all the things!
+This repository contains source code for 'fatcat', an editable catalog of
+published written works (mostly journal articles), with a focus on tracking
+the location and status of full-text copies to ensure "perpetual access".
+
The [RFC](./fatcat-rfc.md) is the original design document, and the best place
to start for background. There is a work-in-progress "guide" at
<https://guide.fatcat.wiki>; the canonical public location of this repository
is <https://github.com/internetarchive/fatcat>.
-There are four main components:
-
-- backend API server and database
-- elasticsearch index
-- API client libraries and bots (eg, ingesters)
-- front-end web interface (built on API and library)
+There are three main components:
-The API server was prototyped in python. "Real" implementation started in
-golang, but shifted to Rust, and is work-in-progress. The beginings of a client
-library, web interface, and data ingesters exist in python. Elasticsearch index
-is currently just a Crossref metadata dump and doesn't match entities in the
-database/API (but is useful for paper lookups).
+- backend API server and database (in Rust)
+- API client libraries and bots (in Python)
+- front-end web interface (in Python; built on API and library)
See the LICENSE file for details permissions and licensing of both python and
rust code. In short, the auto-generated client libraries are permissively
@@ -32,26 +29,28 @@ released, while the API server and web interface are strong copyleft (AGPLv3).
## Status
-- HTTP API
- - [x] base32 encoding of UUID identifiers
- - [x] inverse many-to-many helpers (files-by-release, release-by-creator)
-- SQL Schema
+- SQL and HTTP API schemas
- [x] Basic entities
- [x] one-to-many and many-to-many entities
- [x] JSON(B) "extra" metadata fields
- [x] full rev1 schema for all entities
- [ ] editgroup review: comments? actions?
+ - [ ] file sets and web captures
+- HTTP API Server
+ - [x] base32 encoding of UUID identifiers
+ - [x] inverse many-to-many helpers (files-by-release, release-by-creator)
+ - [ ] Authentication (eg, accounts, OAuth2, JWT)
+ - [ ] Authorization (aka, roles)
- Web Interface
- [x] Migrate Python codebase
- [ ] Creation and editing of all entities
- Other
+ - [x] Elasticsearch schema
- [x] Basic logging
- [x] Swagger-UI
+ - [x] Bulk metadata exports
- [ ] Sentry (error reporting)
- [ ] Metrics
- - [ ] Authentication (eg, accounts, OAuth2, JWT)
- - [ ] Authorization (aka, roles)
- - [ ] bot vs. editor
## Identifiers
diff --git a/TODO b/TODO
index 506c2d2a..c09764d3 100644
--- a/TODO
+++ b/TODO
@@ -2,28 +2,24 @@
## Next Up
- basic webface creation, editing, merging, editgroup approval
-- elastic schema/transform for releases; bulk and continuous scripts
-## QA Blockers
+## Production blockers
- refactors and correctness in rust/TODO
- importers have editor accounts and include editgroup metadata
-
-## Production blockers
-
- enforce single-ident-edit-per-editgroup
=> entity_edit: entity_ident/entity_editgroup should be UNIQ index
=> UPDATE/REPLACE edits?
- crossref importer sets release_type as "stub" when appropriate
- re-implement old python tests
-- real auth
+- real authentication and authorization
- metrics, jwt, config, sentry
## Metadata Import
- manifest: multiple URLs per SHA1
- crossref: relations ("is-preprint-of")
-- crossref: two phse: no citations, then matched citations (via DOI table)
+- crossref: two phase: no citations, then matched citations (via DOI table)
- container import (extra?): lang, region, subject
- crossref: filter works
=> content-type whitelist
@@ -35,8 +31,10 @@
new importers:
- pubmed (medline) (filtered)
=> and/or, use pubmed ID lookups on crossref import
+- arxiv.org
+- DOAJ
- CORE (filtered)
-- semantic scholar (up to 39 million; author de-dupe)
+- semantic scholar (up to 39 million; includes author de-dupe)
## Entity/Edit Lifecycle
@@ -50,7 +48,7 @@ new importers:
## Guide / Book / Style
-- release_type, release_status, url.rel schemas (and enforce in API?)
+- release_type, release_status, url.rel schemas (enforced in API)
- more+better terms+policies: https://tosdr.org/index.html
## Fun Features
@@ -67,12 +65,15 @@ new importers:
## Schema / Entity Fields
+- FileSet and WebSnapshot entities
- `doi` field for containers (at least for "journal" type; maybe for "series"
as well?)
- `retracted`, `translation`, and perhaps `corrected` as flags on releases,
instead of release_status?
+- 'part-of' relation for releases (release to release) and possibly containers
+- `container-type` field for containers (journal, conference, book series, etc)
-## Other
+## Other / Backburner
- refactor openapi schema to use shared response types
- consider using "HTTP 202: Accepted" for entity-mutating calls
@@ -84,8 +85,7 @@ new importers:
=> proof-of-concept, no tests
- add_header Strict-Transport-Security "max-age=3600";
=> 12 hours? 24?
-- elastic pipeline
-- kong or oauth2_proxy for auth, rate-limit, etc
+- haproxy for rate-limiting
- feature flags: consul?
- secrets: vault?
- "authn" microservice: https://keratin.tech/
diff --git a/guide/TODO b/guide/TODO
deleted file mode 100644
index 1c9b7110..00000000
--- a/guide/TODO
+++ /dev/null
@@ -1,10 +0,0 @@
-- scope
-
-- quick passes: spellcheck, " I ", "would/will"
-
-TODO
-- roadmap
-- revise 'implementation' page with details (hosting costs, etc)
-
-DONE
-- policies