From 249c6131621b9cdd83e98421cbd4f885c30abadb Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@robocracy.org>
Date: Tue, 12 Oct 2021 16:42:06 -0700
Subject: update proposals for v0.4 and (hypothetical) v0.5

---
 proposals/20190911_v04_schema_tweaks.md | 41 --------------------------------
 proposals/20190911_v05_schema_tweaks.md | 42 +++++++++++++++++++++++++++++++++
 proposals/20211012_v04_schema_tweaks.md | 30 +++++++++++++++++++++++
 3 files changed, 72 insertions(+), 41 deletions(-)
 delete mode 100644 proposals/20190911_v04_schema_tweaks.md
 create mode 100644 proposals/20190911_v05_schema_tweaks.md
 create mode 100644 proposals/20211012_v04_schema_tweaks.md
diff --git a/proposals/20190911_v04_schema_tweaks.md b/proposals/20190911_v04_schema_tweaks.md
deleted file mode 100644
index 8e61bf36..00000000
--- a/proposals/20190911_v04_schema_tweaks.md
+++ /dev/null
@@ -1,41 +0,0 @@
-
-Status: planned
-
-## Schema Changes for v0.4 Release
-
-Proposed schema changes for next fatcat iteration (v0.4? v0.5?).
-
-SQL (and API, and elasticsearch):
-
-- container:`container_status` as a string enum: eg, "stub",
-  "out-of-print"/"ended" (?), "active", "new"/"small" (?).  Particularly to
-  deal with disambiguation of multiple containers by the same title but
-  separate ISSN-L. For example, "The Lancet".
-- release: `release_month` (to complement `release_date` and `release_year`)
-- file: `file_scope` as a string enum indicating how much content this file
-  includes. Eg, `book`, `chapter`, `article`/`work`, `issue`, `volume`,
-  `abstract`, `component`. Unclear how to initialize this field; default to
-  `article`/`work`?
-- TODO: webcapture: lookup by primary URL sha1?
-- TODO: release: switch how pages work? first/last?
-- TODO: indication of peer-review process? at release or container level?
-- TODO: container: separate canonical and disambiguating titles (?)
-- TODO: container: "imprint" field?
-- TODO: release inter-references using SCHOLIX/Datacite schema
-    https://zenodo.org/record/1120265
-    https://support.datacite.org/docs/connecting-research-outputs#section-related-identifiers
-
-API tweaks:
-
-- add regex restrictions on more `ext_ids`, especially `wikidata_qid`
-- add explicit enums for more keyword fields
-
-API endpoints:
-
-- `GET /auth/token/<editor_id>` endpoint to generate new API token for given
-  editor. Used by web interface, or bot wranglers.
-- create editor endpoint, to allow bot account creation
-- `GET /editor/<ident>/bots` (?) endpoint to enumerate bots wrangled by a
-  specific editor
-
-See `2020_search_improvements` for elasticsearch-only schema updates.
diff --git a/proposals/20190911_v05_schema_tweaks.md b/proposals/20190911_v05_schema_tweaks.md
new file mode 100644
index 00000000..46d7c489
--- /dev/null
+++ b/proposals/20190911_v05_schema_tweaks.md
@@ -0,0 +1,42 @@
+
+Status: planned
+
+## Schema Changes for v0.4 Release
+
+Proposed schema changes for next fatcat iteration (v0.4? v0.5?).
+
+SQL (and API, and elasticsearch):
+
+- `db_get_range_for_editor` is slow when there are many editgroups for editor; add sorted index? meh.
+- release: `release_month` (to complement `release_date` and `release_year`)
+- file: `file_scope` as a string enum indicating how much content this file
+  includes. Eg, `book`, `chapter`, `article`/`work`, `issue`, `volume`,
+  `abstract`, `component`. Unclear how to initialize this field; default to
+  `article`/`work`?
+- file: some way of marking bad/bogus files... by scope? type? status?
+- TODO: webcapture: lookup by primary URL sha1?
+- TODO: release: switch how pages work? first/last?
+- TODO: indication of peer-review process? at release or container level?
+- TODO: container: separate canonical and disambiguating titles (?)
+- TODO: container: "imprint" field?
+- TODO: container: "series" field? eg for conferences
+- TODO: release inter-references using SCHOLIX/Datacite schema
+    https://zenodo.org/record/1120265
+    https://support.datacite.org/docs/connecting-research-outputs#section-related-identifiers
+- TODO: fileset: some sort of lookup; hashes of hashes?
+- TODO: fileset: some indication/handling of git repositories
+
+API tweaks:
+
+- add regex restrictions on more `ext_ids`, especially `wikidata_qid`
+- add explicit enums for more keyword fields
+
+API endpoints:
+
+- `GET /auth/token/<editor_id>` endpoint to generate new API token for given
+  editor. Used by web interface, or bot wranglers.
+- create editor endpoint, to allow bot account creation
+- `GET /editor/<ident>/bots` (?) endpoint to enumerate bots wrangled by a
+  specific editor
+
+See `2020_search_improvements` for elasticsearch-only schema updates.
diff --git a/proposals/20211012_v04_schema_tweaks.md b/proposals/20211012_v04_schema_tweaks.md
new file mode 100644
index 00000000..15ca489e
--- /dev/null
+++ b/proposals/20211012_v04_schema_tweaks.md
@@ -0,0 +1,30 @@
+
+Status: implemented
+
+## Schema Changes for v0.4
+
+Small SQL and API changes. Calling these a minor-level API version increment.
+
+API Schema Changes:
+
+- release `ext_ids`: `hdl` (handle) identifier
+- fileset: `mimetype` for manifest files as a field. This is a SQL schema change as well.
+- container: `issne` and `issnp` as top-level fields, indexed for lookup. SQL
+  schema change.
+- container: `publication_status` as a top-level field, to indicate "active",
+  "discontinued", etc. SQL schema change.
+
+API Endpoints:
+
+- `GET /editor/lookup`: editor lookup by username
+
+Elasticsearch Schemas:
+
+- release: 'hdl' identifier
+- release: `container_publication_status` and `container_issns`
+- release: add missing `version` field (not related to any API change)
+- release: add `tags` for future extensibility
+- release: `is_work_alias` boolean flag for unversioned releases which point
+  to the overall work, or the latest published version of the work. Included
+  from field with the same name in release `extra`.
+- container: `publication_status`
-- 
cgit v1.2.3