diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-22 16:12:01 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-22 16:12:01 -0800 |
commit | 5c7f50b2f497692493bfa54ad4741fdc573352ae (patch) | |
tree | c20cce1884076fffe210ba28e1a569f93ed22827 /guide/src/entity_webcapture.md | |
parent | f3bd82c0308948a63645538bdd9511a503625499 (diff) | |
parent | dd00cec4164c1a1c31c8d9cffb92deb2e30b2211 (diff) | |
download | fatcat-5c7f50b2f497692493bfa54ad4741fdc573352ae.tar.gz fatcat-5c7f50b2f497692493bfa54ad4741fdc573352ae.zip |
Merge branch 'bnewbold-content-scope'
Diffstat (limited to 'guide/src/entity_webcapture.md')
-rw-r--r-- | guide/src/entity_webcapture.md | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/guide/src/entity_webcapture.md b/guide/src/entity_webcapture.md index 8c5615fb..1b3cac55 100644 --- a/guide/src/entity_webcapture.md +++ b/guide/src/entity_webcapture.md @@ -29,4 +29,10 @@ Warning: This schema is not yet stable. - `timestamp` (string, datetime): same format as CDX line timestamp (UTC, etc). Corresponds to the overall capture timestamp. Can be the earliest of CDX timestamps if that makes sense +- `content_scope` (string): for situations where the webcapture does not simply + contain the full representation of a work (eg, HTML fulltext, for an + `article-journal` release), describes what that scope of coverage is. Eg, + `landing-page` it doesn't contain the full content. Landing pages are + out-of-scope for fatcat, but if they were accidentally imported, should mark + them as such so they aren't re-imported. Uses same vocabulary as File entity. - `release_ids` (array of string identifiers): references to `release` entities |