## Next Up - some significant slow-down has happened? transactions, or regexes? summer roadmap: - PUT/UPDATE, DELETE, and merge code paths - faster UPDATE-free bulk import code path - container import (extra?): lang, region, subject - basic API+webface creation, editing, merging, editgroup approval - elastic schema/transform for releases; bulk and continuous scripts features: - fast database dump command: both changelog-based and entity-based (rust) => lighter, more complete dumps for each entity type? - guide skeleton (mdbook; guide.fatcat.wiki) importers: - CORE - wikidata cross-ref (if they have a dump) - manifest: multiple URLs per SHA1 - pubmed (medline), if not in CORE => and/or, use pubmed ID lookups on crossref import - core - semantic scholar (up to 39 million; author de-dupe) - wikidata (if they have a dump) - crossref: relations ("is-preprint-of") - crossref: filter works => content-type whitelist => title length and title/slug blacklist => at least one author (?) => make this a method on Release object => or just set release_stub as "stub"? bugs: - test: release pointing to a collection that has been deleted/redirected => UI crash? july roadmap: - complete and test this round of schema changes - container import (extra?): lang, region, subject - re-run imports - basic API+webface creation, editing, merging, editgroup approval - elastic schema/transform for releases; bulk and continuous scripts ## Schema / Alignment / Scope - "container" -> "venue"? - release_type, release_status, url.rel write-time schema(and others?) name ref: https://www.w3.org/International/questions/qa-personal-names ## API - how to send edit "extra" metadata? - hydrate entities in API ? "expand" query param ## High-Level Priorities - full database dump (export) - manual editing of containers and releases (web interface) ## Web UI - changelog more like a https://semantic-ui.com/views/feed.html ? - instead of grid, maybe https://semantic-ui.com/elements/rail.html ## Performance - write pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints. run these with auto_explain on, then look in logs on dev machine - batch inserts automerge: create editgroup and changelog, mark all edits as accepted, all in a single transaction ## API - hydrate entities in API ? "expand" query param ## Other - basic python hbase/elastic matcher => takes sha1 keys => checks fatcat API + hbase => if not matched yet, tries elastic search => simple ~exact match heuristic => proof-of-concept, no tests - add_header Strict-Transport-Security "max-age=3600"; => 12 hours? 24? - criterion.rs benchmarking - schema.org metadata in webface - bulk endpoint auto-merge mode (huge postgres speedup on import) - elastic pipeline - kong or oauth2_proxy for auth, rate-limit, etc - "authn" microservice: https://keratin.tech/ - PUT for mid-edit revisions - 'parent rev' for revisions (vs. container parent) - "submit" status for editgroups? review - what does openlibrary API look like? x add a 'live' (or 'immutable') flag to revision tables better API docs - https://sourcey.com/spectacle/ - https://github.com/DapperDox/dapperdox CSL: - https://citationstyles.org/ - https://github.com/citation-style-language/documentation/blob/master/primer.txt - https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html - https://github.com/citation-style-language/schema/blob/master/csl-types.rnc - perhaps a "create from CSL" endpoint?