summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2021-11-17 16:15:57 -0800
committerBryan Newbold <bnewbold@robocracy.org>2021-11-17 16:16:25 -0800
commit2c8a266ec7155e159c8e64979bfe572a9bd84c01 (patch)
tree516c86155a127a14d451bb896a6d73ccc8a8c9b3
parent56a7306f1cebf5833238bc4d894261a050c8e3c9 (diff)
downloadfatcat-2c8a266ec7155e159c8e64979bfe572a9bd84c01.tar.gz
fatcat-2c8a266ec7155e159c8e64979bfe572a9bd84c01.zip
proposal: content_scope field
-rw-r--r--proposals/2021-11-17_content_scope.md84
1 files changed, 84 insertions, 0 deletions
diff --git a/proposals/2021-11-17_content_scope.md b/proposals/2021-11-17_content_scope.md
new file mode 100644
index 00000000..8d04808e
--- /dev/null
+++ b/proposals/2021-11-17_content_scope.md
@@ -0,0 +1,84 @@
+
+status: planned
+
+Content Scope Fields
+======================
+
+Usually, "artifact" entities (file, fileset, webcapture) should not contain
+bibliographic metadata about their contents. For example, a file entity
+describing a PDF of a journal article should not indicate the publication
+stage, retraction status, publication type, journal ISSN, or other metadata
+about that article; the `release` entity should contain that information.
+Additionally, it is usually assumed that a single "artifact" entity is a
+complete representation of any associated release entities: the complete
+dataset, or complete article.
+
+This document describes a new metadata field to handle some special cases that
+go against this principle: the `content_scope` of a file, fileset, or
+webcapture. It is intended to be used when there is an exception to the
+assumption that a single "artifact" is a complete representation of a release.
+It is particularly useful when there is a problem with the artifact, resulting
+with it being disassociated with all releases.
+
+
+## Values
+
+This section will get copied to the guide.
+
+- if not set, assume that the artifact entity is valid and represents a
+ complete copy of the release
+- `issue`: artifact contains an entire issue of a serial publication (eg, issue
+ of a journal), representing several releases in full
+- `abstract`: contains only an abstract (short description) of the release, not
+ the release itself (unless the `release_type` itself is `abstract`, in which
+ case it is the entire release)
+- `index`: index of a journal, or series of abstracts from a conference (TODO:
+ separate value for conference abstract lists?)
+- `slides`: slide deck (usually in "landscape" orientation)
+- `front-matter`: non-article content from a journal, such as editorial policies
+- `supplement`: usually a file entity which is a supplement or appendix, not
+ the entire work
+- `component`: a sub-component of a release, which may or may not be associated
+ with a `component` release entity. For example, a single figure or table as
+ part of an article
+- `poster`: digital copy of a poster, eg as displayed at conference poster sessions
+- `sample`: a partial sample of the entire work. eg, just the first page of an
+ article. distinct from `truncated`
+- `truncated`: the file has been truncated at a binary level, and may also be
+ corrupt or invalid. distinct from `sample`
+- `corrupt`: broken, mangled, or corrupt file (at the binary level)
+- `stub`: any other out-of-scope artifact situations, where the artifact
+ represents something which would not link to any possible in-scope release in
+ the catalog (except a `stub` release)
+- `landing-page`: for webcapture, the landing page of a work, as opposed to the
+ work itself
+- `spam`: content is spam. articles, webpages, or issues which include
+ incidental advertisements within them are not counted as `spam`
+
+
+## Implementation
+
+The string field `content_scope` will be added to file, fileset, and webcapture
+entities.
+
+By default, this field does not need to be set. If it is empty, it can be
+assumed that the artifact represents an appropriate copy of the full release.
+If it is set, and the artifact is associated with one or more releases,
+downstream users/code may want to verify that the `content_scope` and
+`release_type` values are consistent. For example, `slides` is not consistent
+with `article-journal`, so such a file should be marked for review, and not
+considered a valid access option or preservation copy for the purposes of
+coverage analysis.
+
+
+## Removing Release Linkage
+
+In cases where the "artifact" entity is not an acceptable representation of any
+release (eg, truncation, corruption, spam), the entity should have the
+`release_ids` field cleared.
+
+Optionally, the new `extra` field `related_release_ids` can be used to indicate
+that an artifact entity has something to do with specific releases, but is not
+a full representation of them. This can be useful for corrupt or partial
+content to link to releases it is a partial representation of.
+