diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-03-23 21:42:32 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-03-23 21:42:32 -0700 |
commit | 5defd444135bc4adb0748b0d2b8c9b88708bdc1a (patch) | |
tree | 599498f0a9ae5a3177d9702c3a7e8b70e39b2b4a /proposals/work_schema.md | |
parent | e70e7cff4b5c910405694fb297330507b49937b1 (diff) | |
download | fatcat-scholar-5defd444135bc4adb0748b0d2b8c9b88708bdc1a.tar.gz fatcat-scholar-5defd444135bc4adb0748b0d2b8c9b88708bdc1a.zip |
proposals: add 2021 UI updates, and rename all to have a date in filename
Diffstat (limited to 'proposals/work_schema.md')
-rw-r--r-- | proposals/work_schema.md | 108 |
1 files changed, 0 insertions, 108 deletions
diff --git a/proposals/work_schema.md b/proposals/work_schema.md deleted file mode 100644 index 97d60ac..0000000 --- a/proposals/work_schema.md +++ /dev/null @@ -1,108 +0,0 @@ - -## Top-Level - -- type: `_doc` (aka, no type, `include_type_name=false`) -- key: keyword (same as `_id`) -- `collapse_key`: work ident, or SIM issue item (for collapsing/grouping search hits) -- `doc_type`: keyword (work or page) -- `doc_index_ts`: timestamp when document indexed -- `work_ident`: fatcat work ident (optional) - -- `biblio`: obj -- `fulltext`: obj -- `ia_sim`: obj -- `abstracts`: nested - body - lang -- `releases`: nested (TBD) -- `access` -- `tags`: array of keywords - -TODO: -- summary fields to index "everything" into? - -## Biblio - -Mostly matches existing `fatcat_release` schema. - -- `release_id` -- `release_revision` -- `title` -- `subtitle` -- `original_title` -- `release_date` -- `release_year` -- `withdrawn_status` -- `language` -- `country_code` -- `volume` (etc) -- `volume_int` (etc) -- `first_page` -- `first_page_int` -- `pages` -- `doi` etc -- `number` (etc) - -NEW: -- `preservation_status` - -[etc] - -- `license_slug` -- `publisher` (etc) -- `container_name` (etc) -- `container_id` -- `container_issnl` -- `container_wikidata_qid` -- `issns` (array) -- `contrib_names` -- `affiliations` -- `creator_ids` - -TODO: should all external identifiers go under `releases` instead of `biblio`? Or some duplicated? - -## Fulltext - -- `status`: web, sim, shadow -- `body` -- `lang` -- `file_mimetype` -- `file_sha1` -- `file_id` -- `thumbnail_url` - -## Abstracts - -Nested object with: - -- body -- lang - -For prototyping, perhaps just make it an object with `body` as an array. - -Only index one abstract per language. - -## SIM (Microfilm) - -Enough details to construct a link or do a lookup or whatever. Note that might -be doing CDL status lookups on SERP pages. - -- `issue_item`: str -- `pub_collection`: str -- `sim_pubid`: str -- `first_page`: str - - -Also pass-through archive.org metadata here (collection-level and item-level) - -## Access - -Start with obj, but maybe later nested? - -- `status`: direct, cdl, repository, publisher, loginwall, paywall, etc -- `mimetype` -- `access_url` -- `file_url` -- `file_id` -- `release_id` - |