aboutsummaryrefslogtreecommitdiffstats
path: root/guide
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2022-03-31 12:33:49 -0700
committerBryan Newbold <bnewbold@robocracy.org>2022-03-31 12:33:49 -0700
commitee1195a14bde28aaf4e630046c31d0c9f5f19530 (patch)
tree930749620fe7586a7e5f4ea38770b12d4f59154a /guide
parent9a99dcb74c13acd98cb4022cb01f28138699e180 (diff)
downloadfatcat-ee1195a14bde28aaf4e630046c31d0c9f5f19530.tar.gz
fatcat-ee1195a14bde28aaf4e630046c31d0c9f5f19530.zip
guide: updates to file and fileset metadata
Diffstat (limited to 'guide')
-rw-r--r--guide/src/entity_file.md4
-rw-r--r--guide/src/entity_fileset.md23
2 files changed, 19 insertions, 8 deletions
diff --git a/guide/src/entity_file.md b/guide/src/entity_file.md
index 84d9eac4..6a11e945 100644
--- a/guide/src/entity_file.md
+++ b/guide/src/entity_file.md
@@ -26,6 +26,10 @@
many articles), and that a release will often have multiple files (differing
only by watermarks, or different digitizations of the same printed work, or
variant MIME/media types of the same published work).
+- `extra` (object with string keys): additional metadata about this file
+ - `path`: filename, with optional path prefix. path must be "relative", not
+ "absolute", and should use UNIX-style forward slashes, not Windows-style
+ backward slashes
#### URL `rel` Vocabulary
diff --git a/guide/src/entity_fileset.md b/guide/src/entity_fileset.md
index 6083a09d..818bb9bd 100644
--- a/guide/src/entity_fileset.md
+++ b/guide/src/entity_fileset.md
@@ -10,16 +10,17 @@
- `sha1` (string): SHA-1 hash in lower-case hex
- `sha256` (string): SHA-256 hash in lower-case hex
- `mimetype` (string): Content type in MIME type schema
- - `extra` (object): any extra metadata about this specific file
- - `original_url`: live web canonical URL to download this file (optional)
- - `webarchive_url`: web archive capture of this file (optional)
- - `platform_id`: platform-specific identifier for this file
+ - `extra` (object): any extra metadata about this specific file. all are
+ optional
+ - `original_url`: live web canonical URL to download this file
+ - `webarchive_url`: web archive capture of this file
- `urls`: An array of "typed" URLs. Order is not meaningful, and may not be
- preserved.
+ preserved. These are URLs for the entire fileset, not individual files.
- `url` (string, required):
Eg: "https://example.edu/~frau/prcding.pdf".
- `rel` (string, required):
- Eg: "webarchive".
+ Eg: "archive-base", "webarchive".
+
- `release_ids` (array of string identifiers): references to `release` entities
- `content_scope` (string): for situations where the fileset does not simply
contain the full representation of a work (eg, all files in dataset, for a
@@ -27,11 +28,17 @@
vocabulary as File entity.
- `extra` (object with string keys): additional metadata about this group of
files, including upstream platform-specific metadata and identifiers
+ - `platform_id`: platform-specific identifier for this fileset
#### URL `rel` types
-- `repository`: URL of a live-web landing page or other location where content can be
- found. May not be machine-reachable.
+Any ending in "-base" implies that a file path (from the manifest) can be
+appended to the "base" URL to get a file download URL. Any "bundle" implies a
+direct link to an archive or "bundle" (like `.zip` or `.tar`) which contains
+all the files in this fileset
+
+- `repository` or `platform`: URL of a live-web landing page or other location
+ where content can be found. May or may not be machine-reachable.
- `webarchive`: web archive version of `repository`
- `repository-bundle`: direct URL to a live-web "archive" file, such as `.zip`,
which contains all of the individual files in this fileset