diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2022-03-31 12:33:49 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2022-03-31 12:33:49 -0700 |
commit | ee1195a14bde28aaf4e630046c31d0c9f5f19530 (patch) | |
tree | 930749620fe7586a7e5f4ea38770b12d4f59154a /guide/src | |
parent | 9a99dcb74c13acd98cb4022cb01f28138699e180 (diff) | |
download | fatcat-ee1195a14bde28aaf4e630046c31d0c9f5f19530.tar.gz fatcat-ee1195a14bde28aaf4e630046c31d0c9f5f19530.zip |
guide: updates to file and fileset metadata
Diffstat (limited to 'guide/src')
-rw-r--r-- | guide/src/entity_file.md | 4 | ||||
-rw-r--r-- | guide/src/entity_fileset.md | 23 |
2 files changed, 19 insertions, 8 deletions
diff --git a/guide/src/entity_file.md b/guide/src/entity_file.md index 84d9eac4..6a11e945 100644 --- a/guide/src/entity_file.md +++ b/guide/src/entity_file.md @@ -26,6 +26,10 @@ many articles), and that a release will often have multiple files (differing only by watermarks, or different digitizations of the same printed work, or variant MIME/media types of the same published work). +- `extra` (object with string keys): additional metadata about this file + - `path`: filename, with optional path prefix. path must be "relative", not + "absolute", and should use UNIX-style forward slashes, not Windows-style + backward slashes #### URL `rel` Vocabulary diff --git a/guide/src/entity_fileset.md b/guide/src/entity_fileset.md index 6083a09d..818bb9bd 100644 --- a/guide/src/entity_fileset.md +++ b/guide/src/entity_fileset.md @@ -10,16 +10,17 @@ - `sha1` (string): SHA-1 hash in lower-case hex - `sha256` (string): SHA-256 hash in lower-case hex - `mimetype` (string): Content type in MIME type schema - - `extra` (object): any extra metadata about this specific file - - `original_url`: live web canonical URL to download this file (optional) - - `webarchive_url`: web archive capture of this file (optional) - - `platform_id`: platform-specific identifier for this file + - `extra` (object): any extra metadata about this specific file. all are + optional + - `original_url`: live web canonical URL to download this file + - `webarchive_url`: web archive capture of this file - `urls`: An array of "typed" URLs. Order is not meaningful, and may not be - preserved. + preserved. These are URLs for the entire fileset, not individual files. - `url` (string, required): Eg: "https://example.edu/~frau/prcding.pdf". - `rel` (string, required): - Eg: "webarchive". + Eg: "archive-base", "webarchive". + - `release_ids` (array of string identifiers): references to `release` entities - `content_scope` (string): for situations where the fileset does not simply contain the full representation of a work (eg, all files in dataset, for a @@ -27,11 +28,17 @@ vocabulary as File entity. - `extra` (object with string keys): additional metadata about this group of files, including upstream platform-specific metadata and identifiers + - `platform_id`: platform-specific identifier for this fileset #### URL `rel` types -- `repository`: URL of a live-web landing page or other location where content can be - found. May not be machine-reachable. +Any ending in "-base" implies that a file path (from the manifest) can be +appended to the "base" URL to get a file download URL. Any "bundle" implies a +direct link to an archive or "bundle" (like `.zip` or `.tar`) which contains +all the files in this fileset + +- `repository` or `platform`: URL of a live-web landing page or other location + where content can be found. May or may not be machine-reachable. - `webarchive`: web archive version of `repository` - `repository-bundle`: direct URL to a live-web "archive" file, such as `.zip`, which contains all of the individual files in this fileset |