summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2021-10-13 12:44:10 -0700
committerBryan Newbold <bnewbold@robocracy.org>2021-10-13 16:21:31 -0700
commit9a333ff02d6b0eb26adb963934557529353de9a4 (patch)
treec3ac52f716c736e46591c5dec8a34c3fd77c1453
parent6fc3cb3bd7dc8a1b40c65fc0ea609a8521aa8daf (diff)
downloadfatcat-9a333ff02d6b0eb26adb963934557529353de9a4.tar.gz
fatcat-9a333ff02d6b0eb26adb963934557529353de9a4.zip
guide updates for v0.4 schema changes
-rw-r--r--guide/src/entity_container.md20
-rw-r--r--guide/src/entity_fileset.md23
-rw-r--r--guide/src/entity_release.md26
3 files changed, 57 insertions, 12 deletions
diff --git a/guide/src/entity_container.md b/guide/src/entity_container.md
index ebfcc9dc..94201d90 100644
--- a/guide/src/entity_container.md
+++ b/guide/src/entity_container.md
@@ -10,16 +10,20 @@
below)
- `container_type` (string): eg, journal vs. conference vs. book series.
Controlled vocabulary is described below.
+- `publication_status` (string): whether actively publishing, never published
+ anything, or discontinued. Controlled vocabularity is described below.
- `publisher` (string): The name of the publishing organization. Eg, "Society
of Curious Students".
- `issnl` (string): an external identifier, with registration controlled by the
[ISSN organization](http://www.issn.org/). Registration is relatively
inexpensive and easy to obtain (depending on world region), so almost all
serial publications have one. The ISSN-L ("linking ISSN") is one of either
- the print ("ISSNp") or electronic ("ISSNe") identifiers for a serial
+ the print (`issp`) or electronic (`issne`) identifiers for a serial
publication; not all publications have both types of ISSN, but many do, which
can cause confusion. The ISSN master list is not gratis/public, but the
ISSN-L mapping is.
+- `issne` (string): Electronic ISSN ("ISSN-E")
+- `issnp` (string): Print ISSN ("ISSN-P")
- `wikidata_qid` (string): external linking identifier to a Wikidata entity.
#### `extra` Fields
@@ -31,8 +35,6 @@
sometimes a very terse, single-word truncated form of the name (eg, a pun).
- `coden` (string): an external identifier, the [CODEN code][]. 6 characters,
all upper-case.
-- `issnp` (string): Print ISSN
-- `issne` (string): Electronic ISSN
- `default_license` (string, slug): short name (eg, "CC-BY-SA") for the
default/recommended license for works published in this container
- `original_name` (string): native name (if `name` is translated)
@@ -50,6 +52,8 @@
- `region` (string, slug): continent/world-region (vocabulary is TODO)
- `discipline` (string, slug): highest-level subject aread (vocabulary is TODO)
- `urls` (array of strings): known homepage URLs for this container (first in array is default)
+- `issnp` (deprecated; string): Print ISSN; deprecated now that there is a top-level field
+- `issne` (deprecated; string): Electronic ISSN; deprecated now that there is a top-level field
Additional fields used in analytics and "curration" tracking:
@@ -98,3 +102,13 @@ preserved).
- `trade`
- `test`
+#### `publication_status` Vocabulary
+
+- `active`: ongoing publication of new releases
+- `suspended`: publication has stopped, but may continue in the future
+- `discontinued`: publication has permanently ceased
+- `vanished`: publication has stopped, and public traces have vanished (eg,
+ publisher website has disapeared with no notice)
+- `never`: no works were ever published under this container
+- `one-time`: releases were all published as a one-time even. for example, a
+ single instance of a conference, or a fixed-size book series
diff --git a/guide/src/entity_fileset.md b/guide/src/entity_fileset.md
index 7e5ac757..e1ac3e67 100644
--- a/guide/src/entity_fileset.md
+++ b/guide/src/entity_fileset.md
@@ -3,15 +3,17 @@
## Fields
-Warning: This schema is not yet stable.
-
- `manifest` (array of objects): each entry represents a file
- `path` (string, required): relative path to file (including filename)
- `size` (integer, required): in bytes
- `md5` (string): MD5 hash in lower-case hex
- `sha1` (string): SHA-1 hash in lower-case hex
- `sha256` (string): SHA-256 hash in lower-case hex
+ - `mimetype` (string): Content type in MIME type schema
- `extra` (object): any extra metadata about this specific file
+ - `original_url`: live web canonical URL to download this file (optional)
+ - `webarchive_url`: web archive capture of this file (optional)
+ - `platform_id`: platform-specific identifier for this file
- `urls`: An array of "typed" URLs. Order is not meaningful, and may not be
preserved.
- `url` (string, required):
@@ -19,3 +21,20 @@ Warning: This schema is not yet stable.
- `rel` (string, required):
Eg: "webarchive".
- `release_ids` (array of string identifiers): references to `release` entities
+- `extra` (object with string keys): additional metadata about this group of
+ files, including upstream platform-specific metadata and identifiers
+
+#### URL `rel` types
+
+- `repository`: URL of a live-web landing page or other location where content can be
+ found. May not be machine-reachable.
+- `webarchive`: web archive version of `repository`
+- `repository-bundle`: direct URL to a live-web "archive" file, such as `.zip`,
+ which contains all of the individual files in this fileset
+- `webarchive-bundle`: web archive version of `repository-bundle`
+- `archive-bundle`: file archive version of `repository-bundle`
+- `repository-base`: live-web base URL/directory from which file `path` can be
+ appended to fetch individual files
+- `archive-base`: base URL/directory from which file `path` can be appended to fetch
+ individual files
+
diff --git a/guide/src/entity_release.md b/guide/src/entity_release.md
index 028d99fc..dd09b30b 100644
--- a/guide/src/entity_release.md
+++ b/guide/src/entity_release.md
@@ -144,11 +144,18 @@ complete or correct in more obscure cases.
should *always* be stored will be needed.
- `core` (string): external identifier for the [CORE] open access
aggregator. These identifiers are integers, but stored in string format.
-- `arxiv` (string) external identifier to a (version-specific) [arxiv.org][]
+- `arxiv` (string): external identifier to a (version-specific) [arxiv.org][]
work. For releases, must always include the `vN` suffix (eg, `v3`).
-- `jstor` (string) external identifier for works in JSTOR.
-- `ark` (string) ARK identifer
-- `mag` (string) Microsoft Academic Graph identifier
+- `jstor` (string): external identifier for works in JSTOR.
+- `ark` (string): ARK identifer
+- `mag` (deprecated; string): Microsoft Academic Graph identifier. Never used,
+ may be deleted in the future
+- `doaj` (string): [DOAJ](https://doaj.org) article-level identifier
+- `dblp` (string): [dblp](https://dblp.org) article-level identifier
+- `oai` (string): OAI-PMH record id. Only use if no other identifier is available
+- `hdl` (string): [handle.net](https://handle.net) identifier. While DOIs are
+ technically handles, do not put DOIs in this field. Handles are transformed
+ to lower-case in database.
[arxiv.org]: https://arxiv.org
@@ -170,6 +177,10 @@ complete or correct in more obscure cases.
should be referenced/indicated instead. Intended as a temporary hint until
proper work-based search is implemented. As an example use, all arxiv release
versions except for the most recent get this set.
+- `is_work_alias` (boolean): if true, then this release is an alias or pointer to
+ the entire work, or the most recent version of the work. For example, some
+ data repositories have separate DOIs for each version of the dataset, then an
+ additional DOI that points to the "lastest" version/DOI.
#### `release_type` Vocabulary
@@ -199,7 +210,8 @@ with a small number of (proposed) extensions:
- `post-weblog` for blog entries
- `report`
- `review`, for things like book reviews, not the "literature review" form of
- `article-journal`, nor peer reviews (see `peer_review`)
+ `article-journal`, nor peer reviews (see `peer_review`). Note `review-book`
+ for book reviews specifically.
- `speech` can be used for eg, slides and recorded conference presentations
themselves, as distinct from `paper-conference`
- `thesis`
@@ -216,8 +228,8 @@ with a small number of (proposed) extensions:
- `stub` (fatcat extension) for releases which have notable external
identifiers, and thus are included "for completeness", but don't seem to
represent a "full work".
-- `component` (fatcat extension) for sub-components of a full paper (or other
- work). Eg, figures or tables.
+- `component` (fatcat extension) for sub-components of a full paper or other
+ work. Eg, tables, or individual files as part of a dataset.
An example of a `stub` might be a paper that gets an extra DOI by accident; the
primary DOI should be a full release, and the accidental DOI can be a `stub`