diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2021-10-13 12:44:10 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2021-10-13 16:21:31 -0700 |
commit | 9a333ff02d6b0eb26adb963934557529353de9a4 (patch) | |
tree | c3ac52f716c736e46591c5dec8a34c3fd77c1453 /guide/src | |
parent | 6fc3cb3bd7dc8a1b40c65fc0ea609a8521aa8daf (diff) | |
download | fatcat-9a333ff02d6b0eb26adb963934557529353de9a4.tar.gz fatcat-9a333ff02d6b0eb26adb963934557529353de9a4.zip |
guide updates for v0.4 schema changes
Diffstat (limited to 'guide/src')
-rw-r--r-- | guide/src/entity_container.md | 20 | ||||
-rw-r--r-- | guide/src/entity_fileset.md | 23 | ||||
-rw-r--r-- | guide/src/entity_release.md | 26 |
3 files changed, 57 insertions, 12 deletions
diff --git a/guide/src/entity_container.md b/guide/src/entity_container.md index ebfcc9dc..94201d90 100644 --- a/guide/src/entity_container.md +++ b/guide/src/entity_container.md @@ -10,16 +10,20 @@ below) - `container_type` (string): eg, journal vs. conference vs. book series. Controlled vocabulary is described below. +- `publication_status` (string): whether actively publishing, never published + anything, or discontinued. Controlled vocabularity is described below. - `publisher` (string): The name of the publishing organization. Eg, "Society of Curious Students". - `issnl` (string): an external identifier, with registration controlled by the [ISSN organization](http://www.issn.org/). Registration is relatively inexpensive and easy to obtain (depending on world region), so almost all serial publications have one. The ISSN-L ("linking ISSN") is one of either - the print ("ISSNp") or electronic ("ISSNe") identifiers for a serial + the print (`issp`) or electronic (`issne`) identifiers for a serial publication; not all publications have both types of ISSN, but many do, which can cause confusion. The ISSN master list is not gratis/public, but the ISSN-L mapping is. +- `issne` (string): Electronic ISSN ("ISSN-E") +- `issnp` (string): Print ISSN ("ISSN-P") - `wikidata_qid` (string): external linking identifier to a Wikidata entity. #### `extra` Fields @@ -31,8 +35,6 @@ sometimes a very terse, single-word truncated form of the name (eg, a pun). - `coden` (string): an external identifier, the [CODEN code][]. 6 characters, all upper-case. -- `issnp` (string): Print ISSN -- `issne` (string): Electronic ISSN - `default_license` (string, slug): short name (eg, "CC-BY-SA") for the default/recommended license for works published in this container - `original_name` (string): native name (if `name` is translated) @@ -50,6 +52,8 @@ - `region` (string, slug): continent/world-region (vocabulary is TODO) - `discipline` (string, slug): highest-level subject aread (vocabulary is TODO) - `urls` (array of strings): known homepage URLs for this container (first in array is default) +- `issnp` (deprecated; string): Print ISSN; deprecated now that there is a top-level field +- `issne` (deprecated; string): Electronic ISSN; deprecated now that there is a top-level field Additional fields used in analytics and "curration" tracking: @@ -98,3 +102,13 @@ preserved). - `trade` - `test` +#### `publication_status` Vocabulary + +- `active`: ongoing publication of new releases +- `suspended`: publication has stopped, but may continue in the future +- `discontinued`: publication has permanently ceased +- `vanished`: publication has stopped, and public traces have vanished (eg, + publisher website has disapeared with no notice) +- `never`: no works were ever published under this container +- `one-time`: releases were all published as a one-time even. for example, a + single instance of a conference, or a fixed-size book series diff --git a/guide/src/entity_fileset.md b/guide/src/entity_fileset.md index 7e5ac757..e1ac3e67 100644 --- a/guide/src/entity_fileset.md +++ b/guide/src/entity_fileset.md @@ -3,15 +3,17 @@ ## Fields -Warning: This schema is not yet stable. - - `manifest` (array of objects): each entry represents a file - `path` (string, required): relative path to file (including filename) - `size` (integer, required): in bytes - `md5` (string): MD5 hash in lower-case hex - `sha1` (string): SHA-1 hash in lower-case hex - `sha256` (string): SHA-256 hash in lower-case hex + - `mimetype` (string): Content type in MIME type schema - `extra` (object): any extra metadata about this specific file + - `original_url`: live web canonical URL to download this file (optional) + - `webarchive_url`: web archive capture of this file (optional) + - `platform_id`: platform-specific identifier for this file - `urls`: An array of "typed" URLs. Order is not meaningful, and may not be preserved. - `url` (string, required): @@ -19,3 +21,20 @@ Warning: This schema is not yet stable. - `rel` (string, required): Eg: "webarchive". - `release_ids` (array of string identifiers): references to `release` entities +- `extra` (object with string keys): additional metadata about this group of + files, including upstream platform-specific metadata and identifiers + +#### URL `rel` types + +- `repository`: URL of a live-web landing page or other location where content can be + found. May not be machine-reachable. +- `webarchive`: web archive version of `repository` +- `repository-bundle`: direct URL to a live-web "archive" file, such as `.zip`, + which contains all of the individual files in this fileset +- `webarchive-bundle`: web archive version of `repository-bundle` +- `archive-bundle`: file archive version of `repository-bundle` +- `repository-base`: live-web base URL/directory from which file `path` can be + appended to fetch individual files +- `archive-base`: base URL/directory from which file `path` can be appended to fetch + individual files + diff --git a/guide/src/entity_release.md b/guide/src/entity_release.md index 028d99fc..dd09b30b 100644 --- a/guide/src/entity_release.md +++ b/guide/src/entity_release.md @@ -144,11 +144,18 @@ complete or correct in more obscure cases. should *always* be stored will be needed. - `core` (string): external identifier for the [CORE] open access aggregator. These identifiers are integers, but stored in string format. -- `arxiv` (string) external identifier to a (version-specific) [arxiv.org][] +- `arxiv` (string): external identifier to a (version-specific) [arxiv.org][] work. For releases, must always include the `vN` suffix (eg, `v3`). -- `jstor` (string) external identifier for works in JSTOR. -- `ark` (string) ARK identifer -- `mag` (string) Microsoft Academic Graph identifier +- `jstor` (string): external identifier for works in JSTOR. +- `ark` (string): ARK identifer +- `mag` (deprecated; string): Microsoft Academic Graph identifier. Never used, + may be deleted in the future +- `doaj` (string): [DOAJ](https://doaj.org) article-level identifier +- `dblp` (string): [dblp](https://dblp.org) article-level identifier +- `oai` (string): OAI-PMH record id. Only use if no other identifier is available +- `hdl` (string): [handle.net](https://handle.net) identifier. While DOIs are + technically handles, do not put DOIs in this field. Handles are transformed + to lower-case in database. [arxiv.org]: https://arxiv.org @@ -170,6 +177,10 @@ complete or correct in more obscure cases. should be referenced/indicated instead. Intended as a temporary hint until proper work-based search is implemented. As an example use, all arxiv release versions except for the most recent get this set. +- `is_work_alias` (boolean): if true, then this release is an alias or pointer to + the entire work, or the most recent version of the work. For example, some + data repositories have separate DOIs for each version of the dataset, then an + additional DOI that points to the "lastest" version/DOI. #### `release_type` Vocabulary @@ -199,7 +210,8 @@ with a small number of (proposed) extensions: - `post-weblog` for blog entries - `report` - `review`, for things like book reviews, not the "literature review" form of - `article-journal`, nor peer reviews (see `peer_review`) + `article-journal`, nor peer reviews (see `peer_review`). Note `review-book` + for book reviews specifically. - `speech` can be used for eg, slides and recorded conference presentations themselves, as distinct from `paper-conference` - `thesis` @@ -216,8 +228,8 @@ with a small number of (proposed) extensions: - `stub` (fatcat extension) for releases which have notable external identifiers, and thus are included "for completeness", but don't seem to represent a "full work". -- `component` (fatcat extension) for sub-components of a full paper (or other - work). Eg, figures or tables. +- `component` (fatcat extension) for sub-components of a full paper or other + work. Eg, tables, or individual files as part of a dataset. An example of a `stub` might be a paper that gets an extra DOI by accident; the primary DOI should be a full release, and the accidental DOI can be a `stub` |