diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-17 16:23:09 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-17 16:23:09 -0800 |
commit | 1e0bf431fbd1ab00f27a305ff3492de8eac90ba6 (patch) | |
tree | 0dbeffe9eef5882eb3ced5b15d1137c569241b90 /guide/src/entity_webcapture.md | |
parent | f64a469b8a8aa9319013d6099ad38e7cde495e18 (diff) | |
download | fatcat-1e0bf431fbd1ab00f27a305ff3492de8eac90ba6.tar.gz fatcat-1e0bf431fbd1ab00f27a305ff3492de8eac90ba6.zip |
guide: document content_scope field
Diffstat (limited to 'guide/src/entity_webcapture.md')
-rw-r--r-- | guide/src/entity_webcapture.md | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/guide/src/entity_webcapture.md b/guide/src/entity_webcapture.md index 8c5615fb..1b3cac55 100644 --- a/guide/src/entity_webcapture.md +++ b/guide/src/entity_webcapture.md @@ -29,4 +29,10 @@ Warning: This schema is not yet stable. - `timestamp` (string, datetime): same format as CDX line timestamp (UTC, etc). Corresponds to the overall capture timestamp. Can be the earliest of CDX timestamps if that makes sense +- `content_scope` (string): for situations where the webcapture does not simply + contain the full representation of a work (eg, HTML fulltext, for an + `article-journal` release), describes what that scope of coverage is. Eg, + `landing-page` it doesn't contain the full content. Landing pages are + out-of-scope for fatcat, but if they were accidentally imported, should mark + them as such so they aren't re-imported. Uses same vocabulary as File entity. - `release_ids` (array of string identifiers): references to `release` entities |