From 249c6131621b9cdd83e98421cbd4f885c30abadb Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Tue, 12 Oct 2021 16:42:06 -0700 Subject: update proposals for v0.4 and (hypothetical) v0.5 --- proposals/20190911_v04_schema_tweaks.md | 41 -------------------------------- proposals/20190911_v05_schema_tweaks.md | 42 +++++++++++++++++++++++++++++++++ proposals/20211012_v04_schema_tweaks.md | 30 +++++++++++++++++++++++ 3 files changed, 72 insertions(+), 41 deletions(-) delete mode 100644 proposals/20190911_v04_schema_tweaks.md create mode 100644 proposals/20190911_v05_schema_tweaks.md create mode 100644 proposals/20211012_v04_schema_tweaks.md diff --git a/proposals/20190911_v04_schema_tweaks.md b/proposals/20190911_v04_schema_tweaks.md deleted file mode 100644 index 8e61bf36..00000000 --- a/proposals/20190911_v04_schema_tweaks.md +++ /dev/null @@ -1,41 +0,0 @@ - -Status: planned - -## Schema Changes for v0.4 Release - -Proposed schema changes for next fatcat iteration (v0.4? v0.5?). - -SQL (and API, and elasticsearch): - -- container:`container_status` as a string enum: eg, "stub", - "out-of-print"/"ended" (?), "active", "new"/"small" (?). Particularly to - deal with disambiguation of multiple containers by the same title but - separate ISSN-L. For example, "The Lancet". -- release: `release_month` (to complement `release_date` and `release_year`) -- file: `file_scope` as a string enum indicating how much content this file - includes. Eg, `book`, `chapter`, `article`/`work`, `issue`, `volume`, - `abstract`, `component`. Unclear how to initialize this field; default to - `article`/`work`? -- TODO: webcapture: lookup by primary URL sha1? -- TODO: release: switch how pages work? first/last? -- TODO: indication of peer-review process? at release or container level? -- TODO: container: separate canonical and disambiguating titles (?) -- TODO: container: "imprint" field? -- TODO: release inter-references using SCHOLIX/Datacite schema - https://zenodo.org/record/1120265 - https://support.datacite.org/docs/connecting-research-outputs#section-related-identifiers - -API tweaks: - -- add regex restrictions on more `ext_ids`, especially `wikidata_qid` -- add explicit enums for more keyword fields - -API endpoints: - -- `GET /auth/token/` endpoint to generate new API token for given - editor. Used by web interface, or bot wranglers. -- create editor endpoint, to allow bot account creation -- `GET /editor//bots` (?) endpoint to enumerate bots wrangled by a - specific editor - -See `2020_search_improvements` for elasticsearch-only schema updates. diff --git a/proposals/20190911_v05_schema_tweaks.md b/proposals/20190911_v05_schema_tweaks.md new file mode 100644 index 00000000..46d7c489 --- /dev/null +++ b/proposals/20190911_v05_schema_tweaks.md @@ -0,0 +1,42 @@ + +Status: planned + +## Schema Changes for v0.4 Release + +Proposed schema changes for next fatcat iteration (v0.4? v0.5?). + +SQL (and API, and elasticsearch): + +- `db_get_range_for_editor` is slow when there are many editgroups for editor; add sorted index? meh. +- release: `release_month` (to complement `release_date` and `release_year`) +- file: `file_scope` as a string enum indicating how much content this file + includes. Eg, `book`, `chapter`, `article`/`work`, `issue`, `volume`, + `abstract`, `component`. Unclear how to initialize this field; default to + `article`/`work`? +- file: some way of marking bad/bogus files... by scope? type? status? +- TODO: webcapture: lookup by primary URL sha1? +- TODO: release: switch how pages work? first/last? +- TODO: indication of peer-review process? at release or container level? +- TODO: container: separate canonical and disambiguating titles (?) +- TODO: container: "imprint" field? +- TODO: container: "series" field? eg for conferences +- TODO: release inter-references using SCHOLIX/Datacite schema + https://zenodo.org/record/1120265 + https://support.datacite.org/docs/connecting-research-outputs#section-related-identifiers +- TODO: fileset: some sort of lookup; hashes of hashes? +- TODO: fileset: some indication/handling of git repositories + +API tweaks: + +- add regex restrictions on more `ext_ids`, especially `wikidata_qid` +- add explicit enums for more keyword fields + +API endpoints: + +- `GET /auth/token/` endpoint to generate new API token for given + editor. Used by web interface, or bot wranglers. +- create editor endpoint, to allow bot account creation +- `GET /editor//bots` (?) endpoint to enumerate bots wrangled by a + specific editor + +See `2020_search_improvements` for elasticsearch-only schema updates. diff --git a/proposals/20211012_v04_schema_tweaks.md b/proposals/20211012_v04_schema_tweaks.md new file mode 100644 index 00000000..15ca489e --- /dev/null +++ b/proposals/20211012_v04_schema_tweaks.md @@ -0,0 +1,30 @@ + +Status: implemented + +## Schema Changes for v0.4 + +Small SQL and API changes. Calling these a minor-level API version increment. + +API Schema Changes: + +- release `ext_ids`: `hdl` (handle) identifier +- fileset: `mimetype` for manifest files as a field. This is a SQL schema change as well. +- container: `issne` and `issnp` as top-level fields, indexed for lookup. SQL + schema change. +- container: `publication_status` as a top-level field, to indicate "active", + "discontinued", etc. SQL schema change. + +API Endpoints: + +- `GET /editor/lookup`: editor lookup by username + +Elasticsearch Schemas: + +- release: 'hdl' identifier +- release: `container_publication_status` and `container_issns` +- release: add missing `version` field (not related to any API change) +- release: add `tags` for future extensibility +- release: `is_work_alias` boolean flag for unversioned releases which point + to the overall work, or the latest published version of the work. Included + from field with the same name in release `extra`. +- container: `publication_status` -- cgit v1.2.3