From ee1195a14bde28aaf4e630046c31d0c9f5f19530 Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@robocracy.org>
Date: Thu, 31 Mar 2022 12:33:49 -0700
Subject: guide: updates to file and fileset metadata

---
 guide/src/entity_file.md    |  4 ++++
 guide/src/entity_fileset.md | 23 +++++++++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

(limited to 'guide/src')

diff --git a/guide/src/entity_file.md b/guide/src/entity_file.md
index 84d9eac4..6a11e945 100644
--- a/guide/src/entity_file.md
+++ b/guide/src/entity_file.md
@@ -26,6 +26,10 @@
   many articles), and that a release will often have multiple files (differing
   only by watermarks, or different digitizations of the same printed work, or
   variant MIME/media types of the same published work).
+- `extra` (object with string keys): additional metadata about this file
+    - `path`: filename, with optional path prefix. path must be "relative", not
+      "absolute", and should use UNIX-style forward slashes, not Windows-style
+      backward slashes
 
 #### URL `rel` Vocabulary
 
diff --git a/guide/src/entity_fileset.md b/guide/src/entity_fileset.md
index 6083a09d..818bb9bd 100644
--- a/guide/src/entity_fileset.md
+++ b/guide/src/entity_fileset.md
@@ -10,16 +10,17 @@
   - `sha1` (string): SHA-1 hash in lower-case hex
   - `sha256` (string): SHA-256 hash in lower-case hex
   - `mimetype` (string): Content type in MIME type schema
-  - `extra` (object): any extra metadata about this specific file
-    - `original_url`: live web canonical URL to download this file (optional)
-    - `webarchive_url`: web archive capture of this file (optional)
-    - `platform_id`: platform-specific identifier for this file
+  - `extra` (object): any extra metadata about this specific file. all are
+    optional
+    - `original_url`: live web canonical URL to download this file
+    - `webarchive_url`: web archive capture of this file
 - `urls`: An array of "typed" URLs. Order is not meaningful, and may not be
-  preserved.
+  preserved. These are URLs for the entire fileset, not individual files.
     - `url` (string, required):
             Eg: "https://example.edu/~frau/prcding.pdf".
     - `rel` (string, required):
-            Eg: "webarchive".
+            Eg: "archive-base", "webarchive".
+
 - `release_ids` (array of string identifiers): references to `release` entities
 - `content_scope` (string): for situations where the fileset does not simply
   contain the full representation of a work (eg, all files in dataset, for a
@@ -27,11 +28,17 @@
   vocabulary as File entity.
 - `extra` (object with string keys): additional metadata about this group of
   files, including upstream platform-specific metadata and identifiers
+  - `platform_id`: platform-specific identifier for this fileset
 
 #### URL `rel` types
 
-- `repository`: URL of a live-web landing page or other location where content can be
-  found. May not be machine-reachable.
+Any ending in "-base" implies that a file path (from the manifest) can be
+appended to the "base" URL to get a file download URL. Any "bundle" implies a
+direct link to an archive or "bundle" (like `.zip` or `.tar`) which contains
+all the files in this fileset
+
+- `repository` or `platform`: URL of a live-web landing page or other location
+  where content can be found. May or may not be machine-reachable.
 - `webarchive`: web archive version of `repository`
 - `repository-bundle`: direct URL to a live-web "archive" file, such as `.zip`,
   which contains all of the individual files in this fileset
-- 
cgit v1.2.3