From ee1195a14bde28aaf4e630046c31d0c9f5f19530 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 31 Mar 2022 12:33:49 -0700 Subject: guide: updates to file and fileset metadata --- guide/src/entity_file.md | 4 ++++ guide/src/entity_fileset.md | 23 +++++++++++++++-------- 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/guide/src/entity_file.md b/guide/src/entity_file.md index 84d9eac4..6a11e945 100644 --- a/guide/src/entity_file.md +++ b/guide/src/entity_file.md @@ -26,6 +26,10 @@ many articles), and that a release will often have multiple files (differing only by watermarks, or different digitizations of the same printed work, or variant MIME/media types of the same published work). +- `extra` (object with string keys): additional metadata about this file + - `path`: filename, with optional path prefix. path must be "relative", not + "absolute", and should use UNIX-style forward slashes, not Windows-style + backward slashes #### URL `rel` Vocabulary diff --git a/guide/src/entity_fileset.md b/guide/src/entity_fileset.md index 6083a09d..818bb9bd 100644 --- a/guide/src/entity_fileset.md +++ b/guide/src/entity_fileset.md @@ -10,16 +10,17 @@ - `sha1` (string): SHA-1 hash in lower-case hex - `sha256` (string): SHA-256 hash in lower-case hex - `mimetype` (string): Content type in MIME type schema - - `extra` (object): any extra metadata about this specific file - - `original_url`: live web canonical URL to download this file (optional) - - `webarchive_url`: web archive capture of this file (optional) - - `platform_id`: platform-specific identifier for this file + - `extra` (object): any extra metadata about this specific file. all are + optional + - `original_url`: live web canonical URL to download this file + - `webarchive_url`: web archive capture of this file - `urls`: An array of "typed" URLs. Order is not meaningful, and may not be - preserved. + preserved. These are URLs for the entire fileset, not individual files. - `url` (string, required): Eg: "https://example.edu/~frau/prcding.pdf". - `rel` (string, required): - Eg: "webarchive". + Eg: "archive-base", "webarchive". + - `release_ids` (array of string identifiers): references to `release` entities - `content_scope` (string): for situations where the fileset does not simply contain the full representation of a work (eg, all files in dataset, for a @@ -27,11 +28,17 @@ vocabulary as File entity. - `extra` (object with string keys): additional metadata about this group of files, including upstream platform-specific metadata and identifiers + - `platform_id`: platform-specific identifier for this fileset #### URL `rel` types -- `repository`: URL of a live-web landing page or other location where content can be - found. May not be machine-reachable. +Any ending in "-base" implies that a file path (from the manifest) can be +appended to the "base" URL to get a file download URL. Any "bundle" implies a +direct link to an archive or "bundle" (like `.zip` or `.tar`) which contains +all the files in this fileset + +- `repository` or `platform`: URL of a live-web landing page or other location + where content can be found. May or may not be machine-reachable. - `webarchive`: web archive version of `repository` - `repository-bundle`: direct URL to a live-web "archive" file, such as `.zip`, which contains all of the individual files in this fileset -- cgit v1.2.3