## Next Up PLAN: x update openapi schema with all the below x python tests (that fail) x rust stubs (to compile) - rust tests => get and delete edits => redirect an entity (via PUT) => un-delete an entity (via PUT) - implement until rust tests pass - implement until python tests pass - nomination form/thing for long-tail fatcat targets - check on LOCKSS - ask adam for nagios/ansible setup - arxiv.org mirror DOCUMENT/TEST EDGE CASES: - PUT updating entities in an editgroup: overwrite edit, or require previous edit to be deleted first? - prev_revision flag: must always be set as a sanity check for edits? what about when previous state was deleted? - when a redirect has the redirect target deleted, what happens? decision: "state" shouldn't change without an update to the entity, though revision_id and redirect_id *can* change - "redirect to redirect" condition. decision: leaf should have redirect_id set to end of chain... but need to worry about the delete/undelete states in that case - edit to redirect A to B is started. B is updated and/or redirected and/or deleted. then A edit is merged. ensure that editgroup accept process handles this correctly - auto-accpet behavior when there were other edits in same group already (or should we just disallow that?) - "current editgroup" behavior, which should probably just disallow - use of "state" in entities as a flag for redirects and direct revision updates - reverting to current version isn't allowed - get of "wip" entities; make sure to check status? hrmpf. - consider dropping CORE identifier NOTE: maybe in the future we can make it easier on ourselves by just saying that if an entity has redirects to it, it can't be deleted or redirected x TODO: don't allow redirect to "wip" rows => needs test (python) - TODO: fix returned error messages; should return type (shortname), and then actual message/description - TODO: maybe better success return message? - allow 'expand' in lookups (particularly for releases/files) => needs test (python or rust) - idea: allow users to generate their own editgroup UUIDs, to reduce a round trips and "hanging" editgroups (created but never edited) - API: deletion of empty, un-accepted editgroups - TODO: elastic inserter should handle deletions and redirects; if state isn't active, delete the document => and an end-to-end test of this behavior. hoo-boy. - test/read: fetching deleted and redirected entities via API and web interface - large refactor: make many endpoints entity-agnostic (passing entity-type as a param) - redirecting and reverting endpoints => in PUT, path for handling redirect_ident or revision => this means can't have any required fields for any entities in API schema - insertion of entity edit rows should be postgres upserts based on ident and editgroup => need UNIQ constraint? - python test: re-deleting a deleted entity should be 4xx, not 5xx - python test: can't delete an accepted edit x API endpoints (GET, DELETE) for entity edits => to allow removing individual edit from editgroup x API endpoints (GET) for entity revisions x API endpoints to find entities that redirect to an ident - what to do with redirect-to-redirect, or deletion of redirect? => for redirect-to-redirect, point to new redirect => for deletion of redirect, keep redirect, but remove revision x API endpoints additional lookup params - enforce "no editing if editgroup accepted" behavior - require and enforce "previous_rev" required in updates - redirect rev_id needs to be updated when primary changes - redirect/delete/update/lifecycle tests and completeness - basic webface creation, editing, merging, editgroup approval - refactor API schema for some entity-generic methos (eg, history, edit operations) to take entity type as a URL path param. greatly reduce macro foolery and method count/complexity, and ease creation of new entities => /{entity}/edit/{edit_id} => /{entity}/{ident}/redirects => /{entity}/{ident}/history ## Production blockers - refactors and correctness in rust/TODO - importers have editor accounts and include editgroup metadata - enforce single-ident-edit-per-editgroup => entity_edit: entity_ident/entity_editgroup should be UNIQ index => UPDATE/REPLACE edits? - crossref importer sets release_type as "stub" when appropriate - re-implement old python tests - real authentication and authorization - metrics, jwt, config, sentry ## Metadata Import - manifest: multiple URLs per SHA1 - crossref: relations ("is-preprint-of") - crossref: two phase: no citations, then matched citations (via DOI table) - container import (extra?): lang, region, subject - crossref: filter works => content-type whitelist => title length and title/slug blacklist => at least one author (?) => make this a method on Release object => or just set release_stub as "stub"? new importers: - pubmed (medline) (filtered) => and/or, use pubmed ID lookups on crossref import - arxiv.org - DOAJ - CORE (filtered) - semantic scholar (up to 39 million; includes author de-dupe) ## Entity/Edit Lifecycle - redirects and merges (API, webface, etc) - test: release pointing to a collection that has been deleted/redirected => UI crash? - commenting and accepting editgroups - editgroup state machine? - enforce "single ident edit per editgroup" => how to "edit an edit"? clobber existing? ## Guide / Book / Style - release_type, release_status, url.rel schemas (enforced in API) - more+better terms+policies: https://tosdr.org/index.html ## Fun Features - "save paper now" => is it in GWB? if not, SPN => get hash + url from GWB, verify mimetype acceptable => is file in fatcat? => what about HBase? GROBID? => create edit, redirect user to editgroup submit page - python client tool and library in pypi => or maybe rust? - bibtext (etc) export ## Schema / Entity Fields - arxiv_id field (keep flip-flopping) - original_title field (?) - FileSet and WebSnapshot entities - `doi` field for containers (at least for "journal" type; maybe for "series" as well?) - `retracted`, `translation`, and perhaps `corrected` as flags on releases, instead of release_status? - 'part-of' relation for releases (release to release) and possibly containers - `container-type` field for containers (journal, conference, book series, etc) ## Other / Backburner - look at: https://ftfy.readthedocs.io/en/latest/ - refactor openapi schema to use shared response types - consider using "HTTP 202: Accepted" for entity-mutating calls - basic python hbase/elastic matcher => takes sha1 keys => checks fatcat API + hbase => if not matched yet, tries elastic search => simple ~exact match heuristic => proof-of-concept, no tests - add_header Strict-Transport-Security "max-age=3600"; => 12 hours? 24? - haproxy for rate-limiting - feature flags: consul? - secrets: vault? - "authn" microservice: https://keratin.tech/ better API docs - readme.io has a free open source plan (or at least used to) - https://github.com/readmeio/api-explorer - https://github.com/lord/slate - https://sourcey.com/spectacle/ - https://github.com/DapperDox/dapperdox CSL: - https://citationstyles.org/ - https://github.com/citation-style-language/documentation/blob/master/primer.txt - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html - https://github.com/citation-style-language/schema/blob/master/csl-types.rnc - perhaps a "create from CSL" endpoint?