## Next Up bugs: - test: release pointing to a collection that has been deleted/redirected => UI crash? schema: - primary key types => idents as base32 => editor_id and editgroup as idents => revisions as UUID - multiple URLs per file => {type, url} table; display code to chose "best" => web, repo, webarchive, shadow (?) - external idents (as columns) => pm_id => pmc_id => wikidata_id (creator, release, container) => oclc_id => viaf_id (creator) - release_ref => 'raw'/'extra' json column => title => url => doi => etc... => citaion ID (`oci_id`) => release_id - release_contrib => add 'raw' json column? or just extra? - abstracts => new table; primary key SHA-1 => release has multiple: {markup, lang, abstract_sha1} - other changes (see notebook) => parent rev in edit table => timestamp columns - "container" -> "venue"? features: - fast database dump command: both changelog-based and entity-based (rust) importers: - pubmed (medline) - core - semantic scholar (up to 39 million; author de-dupe) - wikidata (if they have a dump) other: - update RFC - basic python hbase/elastic matcher => takes sha1 keys => checks fatcat API + hbase => if not matched yet, tries elastic search => simple ~exact match heuristic => proof-of-concept, no tests ## Schema / Alignment / Scope - abstracts! as files? separate table? format (latex, html, etc)? => crossref has ~13% as JATS; plus pubmed, plus arxiv - work_type, release_type, release_status name ref: https://www.w3.org/International/questions/qa-personal-names ## High-Level Priorities - full database dump and reload (import/export) - manual editing of containers and releases (web interface) ## Web UI - changelog more like a https://semantic-ui.com/views/feed.html ? - instead of grid, maybe https://semantic-ui.com/elements/rail.html ## Performance - write pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints. run these with auto_explain on, then look in logs on dev machine - batch inserts automerge: create editgroup and changelog, mark all edits as accepted, all in a single transaction ## API - hydrate entities in API ? "expand" query param ? "full entity" field ? refactor file_releases to have objects as type ## Other - schema.org metadata in webface - bulk endpoint auto-merge mode (huge postgres speedup on import) - elastic pipeline - kong or oauth2_proxy for auth, rate-limit, etc - "authn" microservice: https://keratin.tech/ - PUT for mid-edit revisions - 'parent rev' for revisions (vs. container parent) - "submit" status for editgroups? review - what does openlibrary API look like? x add a 'live' (or 'immutable') flag to revision tables CSL: - https://citationstyles.org/ - https://github.com/citation-style-language/documentation/blob/master/primer.txt - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html - https://github.com/citation-style-language/schema/blob/master/csl-types.rnc - perhaps a "create from CSL" endpoint?