diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-05-20 16:57:57 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-05-20 16:57:57 -0700 |
commit | cd829eedb5bfc7328ab5266650a625a6c88db6fa (patch) | |
tree | 90aa164cbd7f4e86aadc25dbd036dab680c30e80 /guide/src/entity_release.md | |
parent | eb31be2172264091e192bcb4f17ffd571253fffa (diff) | |
download | fatcat-cd829eedb5bfc7328ab5266650a625a6c88db6fa.tar.gz fatcat-cd829eedb5bfc7328ab5266650a625a6c88db6fa.zip |
start refactoring guide (per-entity pages)
Diffstat (limited to 'guide/src/entity_release.md')
-rw-r--r-- | guide/src/entity_release.md | 303 |
1 files changed, 303 insertions, 0 deletions
diff --git a/guide/src/entity_release.md b/guide/src/entity_release.md new file mode 100644 index 00000000..709a020c --- /dev/null +++ b/guide/src/entity_release.md @@ -0,0 +1,303 @@ + +# Release Entity Reference + +## Fields + +- `title` (string, required): the display title of the release. May include subtitle. +- `subtitle` (string): intended only to be used primarily with books, not + journal articles. Subtitle may also be appended to the `title` instead of + populating this field. +- `original_title` (string): the full original language title, if `title` is translated +- `work_id` (fatcat identifier; required): the (single) work that this release + is grouped under. If not specified in a creation (`POST`) action, the API + will auto-generate a work. +- `container_id` (fatcat identifier): a (single) container that this release is + part of. When expanded the `container` field contains the full `container` + entity. +- `release_type` (string, controlled set): represents the medium or form-factor + of this release; eg, "book" versus "journal article". Not necessarily + the same across all releases of a work. See definitions below. +- `release_state` (string, controlled set): represents the publishing/review + lifecycle status of this particular release of the work. See definitions + below. +- `release_date` (string, ISO date format): when this release was first made + publicly available. Blank if only year is known. +- `release_year` (integer): year when this release was first made + publicly available; should match `release_date` if both are known. +- `ext_ids` (key/value object of string-to-string mappings): external + identifiers. At least an empty `ext_ids` object is always required for + release entities, so individual identifiers can be accessed directly. +- `volume` (string): optionally, stores the specific volume of a serial + publication this release was published in. + type: string +- `issue` (string): optionally, stores the specific issue of a serial + publication this release was published in. +- `pages` (string): the pages (within a volume/issue of a publication) that + this release can be looked up under. This is a free-form string, and could + represent the first page, a range of pages, or even prefix pages (like + "xii-xxx"). +- `publisher` (string): name of the publishing entity. This does not need to be + populated if the associated `container` entity has the publisher field set, + though it is acceptable to duplicate, as the publishing entity of a container + may differ over time. Should be set for singleton releases, like books. +- `language` (string, slug): the primary language used in this particular release of + the work. Only a single language can be specified; additional languages can + be stored in "extra" metadata (TODO: which field?). This field should be a + valid RFC1766/ISO639 language code (two letters). AKA, a controlled + vocabulary, not a free-form name of the language. +- `license_slug` (string, slug): the license of this release. Usually a + creative commons short code (eg, `CC-BY`), though a small number of other + short names for publisher-specific licenses are included (TODO: list these). +- `contribs` (array of objects): an array of authorship and other `creator` contributions to this + release. Contribution fields include: + - `index` (integer, optional): the (zero-indexed) order of this + author. Authorship order has significance in many fields. Non-author + contributions (illustration, translation, editorship) may or may not be + ordered, depending on context, but index numbers should be unique per + release (aka, there should not be "first author" and "first translator") + - `creator_id` (identifier): if known, a reference to a specific `creator` + - `raw_name` (string): the name of the contributor, as attributed in the + text of this work. If the `creator_id` is linked, this may be different + from the `display_name`; if a creator is not linked, this field is + particularly important. Syntax and name order is not specified, but most + often will be "display order", not index/alphabetical (in Western + tradition, surname followed by given name). + - `role` (string, of a set): the type of contribution, from a controlled + vocabulary. TODO: vocabulary needs review. + - `extra` (string): additional context can go here. For example, author + affiliation, "this is the corresponding author", etc. +- `refs` (array of ident strings): references (aka, citations) to other releases. References + can only be linked to a specific target release (not a work), though it may + be ambiguous which release of a work is being referenced if the citation is + not specific enough. Reference fields include: + - `index` (integer, optional): reference lists and bibliographies almost + always have an implicit order. Zero-indexed. Note that this is distinct + from the `key` field. + - `target_release_id` (fatcat identifier): if known, and the release + exists, a cross-reference to the Fatcat entity + - `extra` (JSON, optional): additional citation format metadata can be + stored here, particularly if the citation schema does not align. Common + fields might be "volume", "authors", "issue", "publisher", "url", and + external identifiers ("doi", "isbn13"). + - `key` (string): works often reference works with a short slug or index + number, which can be captured here. For example, "[BROWN2017]". Keys + generally supersede the `index` field, though both can/should be + supplied. + - `year` (integer): year of publication of the cited release. + - `container_title` (string): if applicable, the name of the container of + the release being cited, as written in the citation (usually an + abbreviation). + - `title` (string): the title of the work/release being cited, as written. + - `locator` (string): a more specific reference into the work/release being + cited, for example the page number(s). For web reference, store the URL + in "extra", not here. +- `abstracts` (array of objects): see below + - `sha1` (string, hex, required): reference to the abstract content (string). + Example: "3f242a192acc258bdfdb151943419437f440c313" + - `content` (string): The abstract raw content itself. Example: `<jats:p>Some + abstract thing goes here</jats:p>` + - `mimetype` (string): not formally required, but should effectively always get + set. `text/plain` if the abstract doesn't have a structured format + - `lang` (string, controlled set): the human language this abstract is in. See + the `lang` field of release for format and vocabulary. + +#### External Identifiers (`ext_ids`) + +The `ext_ids` object name-spaces external identifiers and makes it easier to +add new identifiers to the schema in the future. + +- `doi` (string): full DOI number, lower-case. Example: "10.1234/abcde.789". + See the "External Identifiers" section of style guide for more notes + about DOIs specifically. +- `wikidata_qid` (string): external identifier for Wikidata entities. These are + integers prefixed with "Q", like "Q4321". Each `release` entity can be + associated with at most one Wikidata entity (this field is not an array), and + Wikidata entities should be associated with at most a single `release`. In + the future it may be possible to associate Wikidata entities with `work` + entities instead. +- `isbn13` (string): external identifier for books. ISBN-9 and other formats + should be converted to canonical ISBN-13. +- `pmid` (string): external identifier for PubMed database. These are bare + integers, but stored in a string format. +- `pmcid` (string): external identifier for PubMed Central database. These are + integers prefixed with "PMC" (upper case), like "PMC4321". Versioned PMCIDs + can also be stored (eg, "PMC4321.1"; future clarification of whether versions + should *always* be stored will be needed. +- `core` (string): external identifier for the [CORE] open access + aggregator. These identifiers are integers, but stored in string format. +- `arxiv` (string) external identifier to a (version-specific) [arxiv.org]() + work. For releases, must always include the `vN` suffix (eg, `v3`). +- `jstor` (string) external identifier for works in JSTOR. +- `ark` (string) ARK identifer +- `mag` (string) Microsoft Academic Graph identifier + +[arxiv.org]: https://arxiv.org + +#### `extra` Fields + +- `crossref` (object), for extra crossref-specific metadata + - `subject` (array of strings) for subject/category of content + - `type` (string) raw/original Crossref type + - `alternative-id` (array of strings) + - `archive` (array of strings), indicating preservation services deposited + - `funder` (object/dictionary) +- `aliases` (array of strings) for additional titles this release might be + known by +- `container_name` (string) if not matched to a container entity +- `subtitle` (string) +- `group-title` (string) for releases within an collection/group +- `translation_of` (release identifier) if this release is a translation of + another (usually under the same work) +- `withdrawn_date` (string, ISO date format): if this release has been + retracted (post-publication) or withdrawn (pre- or post-publication), this is + the datetime of that event. Retractions also result in a `retraction` release + under the same `work` entity. This is intended to migrate from "extra" to a + full release entity field. + +#### `release_type` Vocabulary + +This vocabulary is based on the +[CSL types](http://docs.citationstyles.org/en/stable/specification.html#appendix-iii-types), +with a small number of (proposed) extensions: + +- `article-magazine` +- `article-journal`, including pre-prints and working papers +- `book` +- `chapter` is allowed as they are frequently referenced and read independent + of the entire book. The data model does not currently support linking a + subset of a release to an entity representing the entire release. The + release/work/file distinctions should not be used to group multiple chapters under + a single work; a book chapter can be it's own work. A paper which is + republished as a chapter (eg, in a collection, or "edited" book) can have + both releases under one work. The criteria of whether to "split" a book and + have release entities for each chapter is whether the chapter has been + cited/reference as such. +- `dataset` +- `entry`, which can be used for generic web resources like question/answer + site entries. +- `entry-encyclopedia` +- `manuscript` +- `paper-conference` +- `patent` +- `post-weblog` for blog entries +- `report` +- `review`, for things like book reviews, not the "literature review" form of + `article-journal`, nor peer reviews (see `peer_review`) +- `speech` can be used for eg, slides and recorded conference presentations + themselves, as distinct from `paper-conference` +- `thesis` +- `webpage` +- `peer_review` (fatcat extension) +- `software` (fatcat extension) +- `standard` (fatcat extension), for technical standards like RFCs +- `abstract` (fatcat extension), for releases that are only an abstract of a + larger work. In particular, translations. Many are granted DOIs. +- `editorial` (custom extension) for columns, "in this issue", and other + content published along peer-reviewed content in journals. Many are granted DOIs. +- `letter` for "letters to the editor", "authors respond", and + sub-article-length published content. Many are granted DOIs. +- `stub` (fatcat extension) for releases which have notable external + identifiers, and thus are included "for completeness", but don't seem to + represent a "full work". + +An example of a `stub` might be a paper that gets an extra DOI by accident; the +primary DOI should be a full release, and the accidental DOI can be a `stub` +release under the same work. `stub` releases shouldn't be considered full +releases when counting or aggregating (though if technically difficult this may +not always be implemented). Other things that can be categorized as stubs +(which seem to often end up mis-categorized as full articles in bibliographic +databases): + +- commercial advertisements +- "trap" or "honey pot" works, which are fakes included in databases to + detect re-publishing without attribution +- "This page is intentionally blank" +- "About the author", "About the editors", "About the cover" +- "Acknowledgments" +- "Notices" + +All other CSL types are also allowed, though they are mostly out of scope: + +- `article` (generic; should usually be some other type) +- `article-newspaper` +- `bill` +- `broadcast` +- `entry-dictionary` +- `figure` +- `graphic` +- `interview` +- `legislation` +- `legal_case` +- `map` +- `motion_picture` +- `musical_score` +- `pamphlet` +- `personal_communication` +- `post` +- `review-book` +- `song` +- `treaty` + +For the purpose of statistics, the following release types are considered +"papers": + +- `article-journal` +- `chapter` +- `paper-conference` +- `thesis` + +#### `release_state` Vocabulary + +These roughly follow the [DRIVER](http://web.archive.org/web/20091109125137/http://www2.lse.ac.uk/library/versions/VERSIONS_Toolkit_v1_final.pdf) publication version guidelines, with the addition of a `retracted` status. + +- `draft` is an early version of a work which is not considered for peer + review. Sometimes these are posted to websites or repositories for early + comments and feedback. +- `submitted` is the version that was submitted for publication. Also known as + "pre-print", "pre-review", "under review". Note that this doesn't imply that + the work was every actually submitted, reviewed, or accepted for publication, + just that this is the version that "would be". Most versions in pre-print + repositories are likely to have this status. +- `accepted` is a version that has undergone peer review and accepted for + published, but has not gone through any publisher copy editing or + re-formatting. Also known as "post-print", "author's manuscript", + "publisher's proof". +- `published` is the version that the publisher distributes. May include minor + (gramatical, typographical, broken link, aesthetic) corrections. Also known + as "version of record", "final publication version", "archival copy". +- `updated`: post-publication significant updates (considered a separate release + in Fatcat). Also known as "correction" (in the context of either a published + "correction notice", or the full new version) +- `retraction` for post-publication retraction notices (should be a release + under the same work as the `published` release) + +Note that in the case of a retraction, the original publication does not get +state `retracted`, only the retraction notice does. The original publication +does get a `withdrawn_status` metadata field set. + +When blank, indicates status isn't known, and wasn't inferred at creation time. +Can often be interpreted as `published`, but be careful! + +#### `contribs.role` Vocabulary + +- `author` +- `translator` +- `illustrator` +- `editor` + +All other CSL role types are also allowed, though are mostly out of scope for +Fatcat: + +- `collection-editor` +- `composer` +- `container-author` +- `director` +- `editorial-director` +- `editortranslator` +- `interviewer` +- `original-author` +- `recipient` +- `reviewed-author` + +If blank, indicates that type of contribution is not known; this can often be +interpreted as authorship. |