aboutsummaryrefslogtreecommitdiffstats
path: root/guide/src/entity_webcapture.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2021-11-17 16:23:09 -0800
committerBryan Newbold <bnewbold@robocracy.org>2021-11-17 16:23:09 -0800
commit1e0bf431fbd1ab00f27a305ff3492de8eac90ba6 (patch)
tree0dbeffe9eef5882eb3ced5b15d1137c569241b90 /guide/src/entity_webcapture.md
parentf64a469b8a8aa9319013d6099ad38e7cde495e18 (diff)
downloadfatcat-1e0bf431fbd1ab00f27a305ff3492de8eac90ba6.tar.gz
fatcat-1e0bf431fbd1ab00f27a305ff3492de8eac90ba6.zip
guide: document content_scope field
Diffstat (limited to 'guide/src/entity_webcapture.md')
-rw-r--r--guide/src/entity_webcapture.md6
1 files changed, 6 insertions, 0 deletions
diff --git a/guide/src/entity_webcapture.md b/guide/src/entity_webcapture.md
index 8c5615fb..1b3cac55 100644
--- a/guide/src/entity_webcapture.md
+++ b/guide/src/entity_webcapture.md
@@ -29,4 +29,10 @@ Warning: This schema is not yet stable.
- `timestamp` (string, datetime): same format as CDX line timestamp (UTC, etc).
Corresponds to the overall capture timestamp. Can be the earliest of CDX
timestamps if that makes sense
+- `content_scope` (string): for situations where the webcapture does not simply
+ contain the full representation of a work (eg, HTML fulltext, for an
+ `article-journal` release), describes what that scope of coverage is. Eg,
+ `landing-page` it doesn't contain the full content. Landing pages are
+ out-of-scope for fatcat, but if they were accidentally imported, should mark
+ them as such so they aren't re-imported. Uses same vocabulary as File entity.
- `release_ids` (array of string identifiers): references to `release` entities