From 1e0bf431fbd1ab00f27a305ff3492de8eac90ba6 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 17 Nov 2021 16:23:09 -0800 Subject: guide: document content_scope field --- guide/src/entity_webcapture.md | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'guide/src/entity_webcapture.md') diff --git a/guide/src/entity_webcapture.md b/guide/src/entity_webcapture.md index 8c5615fb..1b3cac55 100644 --- a/guide/src/entity_webcapture.md +++ b/guide/src/entity_webcapture.md @@ -29,4 +29,10 @@ Warning: This schema is not yet stable. - `timestamp` (string, datetime): same format as CDX line timestamp (UTC, etc). Corresponds to the overall capture timestamp. Can be the earliest of CDX timestamps if that makes sense +- `content_scope` (string): for situations where the webcapture does not simply + contain the full representation of a work (eg, HTML fulltext, for an + `article-journal` release), describes what that scope of coverage is. Eg, + `landing-page` it doesn't contain the full content. Landing pages are + out-of-scope for fatcat, but if they were accidentally imported, should mark + them as such so they aren't re-imported. Uses same vocabulary as File entity. - `release_ids` (array of string identifiers): references to `release` entities -- cgit v1.2.3