## Next Up bugs: - UI: handle file null size field - not pulling in orcid given names correctly (?) - test: release pointing to a collection that has been deleted/redirected => UI crash? - multiple URLs per file schema: - encoding of ident UUIDs (but not other UUIDs) (no schema change) - revisions, edits, editgroups, editor_id as UUIDs - external idents: citation IDs, medline (PMID), pubmed (PMCID), wikidata, CORE => http://opencitations.net/index/coci => but should just appear in regular dumps? shrug features: - fast database dump command: both changelog-based and entity-based (rust) importers: - citations - medline - core - wikidata (if they have a dump) other: - update RFC - basic python hbase/elastic matcher => takes sha1 keys => checks fatcat API + hbase => if not matched yet, tries elastic search => simple ~exact match heuristic => proof-of-concept, no tests ## Schema / Alignment / Scope - add Open Citation Identifiers... and COCI importer script instead of refs during crossref import? - wikidata IDs are first-class identifiers (release, container, creator) - switch a bunch more primary keys to UUID: revs, editor ids, edit numbers - multiple URLs - make "raw" fields in release_ref/release_contrib JSON? - abstracts! as files? separate table? format (latex, html, etc)? - other identifiers (just in extra?) - work_type, release_type, release_status name ref: https://www.w3.org/International/questions/qa-personal-names ## High-Level Priorities - full database dump and reload (import/export) - manual editing of containers and releases (web interface) ## Web UI - changelog more like a https://semantic-ui.com/views/feed.html ? - instead of grid, maybe https://semantic-ui.com/elements/rail.html ## Performance - write pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints. run these with auto_explain on, then look in logs on dev machine - batch inserts automerge: create editgroup and changelog, mark all edits as accepted, all in a single transaction ## API - hydrate entities in API ? "expand" query param ? "full entity" field ? refactor file_releases to have objects as type ## Other - bulk endpoint auto-merge mode (huge postgres speedup on import) - elastic pipeline - kong or oauth2_proxy for auth, rate-limit, etc - "authn" microservice: https://keratin.tech/ - PUT for mid-edit revisions - 'parent rev' for revisions (vs. container parent) - "submit" status for editgroups? review - what does openlibrary API look like? - add a 'live' (or 'immutable') flag to revision tables CSL: - https://citationstyles.org/ - https://github.com/citation-style-language/documentation/blob/master/primer.txt - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html - https://github.com/citation-style-language/schema/blob/master/csl-types.rnc - perhaps a "create from CSL" endpoint?