more guide tweaks; not a full review/rewrite

author: Bryan Newbold <bnewbold@robocracy.org> 2019-02-14 16:19:26 -0800
committer: Bryan Newbold <bnewbold@robocracy.org> 2019-02-14 16:19:26 -0800
commit: 70b4bc18b13f59c9d42c8e44ef872dfd2e1abef3 (patch)
tree: 1c4706394047bce6a086228e2efe8632d8bc1a23 /guide/src
parent: 56edebe7c2e090c4f25179f03a2d77d78ba59219 (diff)
download: fatcat-70b4bc18b13f59c9d42c8e44ef872dfd2e1abef3.tar.gz
fatcat-70b4bc18b13f59c9d42c8e44ef872dfd2e1abef3.zip
12 files changed, 129 insertions, 201 deletions
diff --git a/guide/src/container_extra.md b/guide/src/container_extra.md
deleted file mode 100644
index 224b7e8a..00000000
--- a/guide/src/container_extra.md
+++ /dev/null
@@ -1,78 +0,0 @@
-
-'extra' fields:
-
-    doaj
-        as_of: datetime of most recent check; if not set, not actually in DOAJ
-        seal: bool
-        work_level: bool (are work-level publications deposited with DOAJ?)
-        archiving: array, can include 'library' or 'other'
-    road
-        as_of: datetime of most recent check; if not set, not actually in ROAD
-    pubmed (TODO: delete?)
-        as_of: datetime of most recent check; if not set, not actually indexed in pubmed
-    norwegian (TODO: drop this?)
-        as_of: datetime of most recent check; if not set, not actually indexed in pubmed
-        id (integer)
-        level (integer; 0-2)
-    kbart
-        lockss
-            year_rle
-            volume_rle
-        portico
-            ...
-        clockss
-            ...
-    sherpa_romeo
-        color
-    jstor
-        year_rle
-        volume_rle
-    scopus
-        id
-        TODO: print/electronic distinction?
-    wos
-        id
-    doi
-        crossref_doi: DOI of the title in crossref (if exists)
-        prefixes: array of strings (DOI prefixes, up to the '/'; any registrar, not just Crossref)
-    ia
-        sim
-            nap_id
-            year_rle
-            volume_rle
-        longtail: boolean
-        homepage
-            as_of: datetime of last attempt
-            url
-            status: HTTP/heritrix status of homepage crawl
-
-    issnp: string
-    issne: string
-    coden: string
-    abbrev: string
-    oclc_id: string (TODO: lookup?)
-    lccn_id: string (TODO: lookup?)
-    dblb_id: string
-    default_license: slug
-    original_name: native name (if name is translated)
-    platform: hosting platform: OJS, wordpress, scielo, etc
-    mimetypes: array of strings (eg, 'application/pdf', 'text/html')
-    first_year: year (integer)
-    last_year: if publishing has stopped
-    primary_language: single ISO code, or 'mixed'
-    languages: array of ISO codes
-    region: TODO: continent/world-region
-    nation: shortcode of nation
-    discipline: TODO: highest-level subject; "life science", "humanities", etc
-    field: TODO: narrower description of field
-    subjects: TODO?
-    url: homepage
-    is_oa: boolean. If true, can assume all releases under this container are "Open Access"
-    TODO: domains, if exclusive?
-    TODO: fulltext_regex, if a known pattern?
-
-For KBART, etc:
-    We "over-count" on the assumption that "in-progress" status works will soon actually be preserved.
-    year and volume spans are run-length-encoded arrays, using integers:
-        - if an integer, means that year is preserved
-        - if an array of length 2, means everything between the two numbers (inclusive) is preserved
diff --git a/guide/src/entity_fields.md b/guide/src/entity_fields.md
index 7e5375b0..209b6154 100644
--- a/guide/src/entity_fields.md
+++ b/guide/src/entity_fields.md
@@ -84,6 +84,11 @@ Additional fields used in analytics and "curration" tracking:
   - `sim` (object): same format as `kbart` preservation above; coverage in microfilm collection
   - `longtail` (bool): is this considered a "long-tail" open access venue
 
+For KBART and other "coverage" fields, we "over-count" on the assumption that
+works with "in-progress" status will soon actually be preserved. Elements of
+these arrays are either an integer (means that single year is preserved), or an
+array of length two (meaning everything between the two numbers (inclusive) is
+preserved).
 
 [CODEN]: https://en.wikipedia.org/wiki/CODEN
 
@@ -258,7 +263,7 @@ Warning: This schema is not yet stable.
       always have an implicit order. Zero-indexed. Note that this is distinct
       from the `key` field.
     - `target_release_id` (fatcat identifier): if known, and the release
-      exists, a cross-reference to the fatcat entity
+      exists, a cross-reference to the Fatcat entity
     - `extra` (JSON, optional): additional citation format metadata can be
       stored here, particularly if the citation schema does not align. Common
       fields might be "volume", "authors", "issue", "publisher", "url", and
@@ -316,7 +321,6 @@ This vocabulary is based on the
 with a small number of (proposed) extensions:
 
 - `article-magazine`
-- `article-newspaper`
 - `article-journal`, including pre-prints and working papers
 - `book`
 - `chapter` is allowed as they are frequently referenced and read independent
@@ -337,42 +341,45 @@ with a small number of (proposed) extensions:
 - `patent`
 - `post-weblog` for blog entries
 - `report`
-- `review`, for things like book reviews, not the "literature review" form of `article-journal`
+- `review`, for things like book reviews, not the "literature review" form of
+  `article-journal`, nor peer reviews (see `peer_review`)
 - `speech` can be used for eg, slides and recorded conference presentations
   themselves, as distinct from `paper-conference`
 - `thesis`
 - `webpage`
 - `peer_review` (fatcat extension)
 - `software` (fatcat extension)
-- `standard` (fatcat extension)
-- `abstract` (fatcat extension)
+- `standard` (fatcat extension), for technical standards like RFCs
+- `abstract` (fatcat extension), for releases that are only an abstract of a
+  larger work. In particular, translations. Many are granted DOIs.
 - `editorial` (custom extension) for columns, "in this issue", and other
-  content published along peer-reviewed content in journals.
+  content published along peer-reviewed content in journals. Many are granted DOIs.
 - `letter` for "letters to the editor", "authors respond", and
-  sub-article-length published content
-- `example` (custom extension) for dummy or example releases that have valid
-  (registered) identifiers. Other metadata does not need to match "canonical"
-  examples.
+  sub-article-length published content. Many are granted DOIs.
 - `stub` (fatcat extension) for releases which have notable external
   identifiers, and thus are included "for completeness", but don't seem to
-  represent a "full work". An example might be a paper that gets an extra DOI
-  by accident; the primary DOI should be a full release, and the accidental DOI
-  can be a `stub` release under the same work. `stub` releases shouldn't be
-  considered full releases when counting or aggregating (though if technically
-  difficult this may not always be implemented). Other things that can be
-  categorized as stubs (which seem to often end up mis-categorized as full
-  articles in bibliographic databases):
-    - commercial advertisements
-    - "trap" or "honey pot" works, which are fakes included in databases to
-      detect re-publishing without attribution
-    - "This page is intentionally blank"
-    - "About the author", "About the editors", "About the cover"
-    - "Acknowledgments"
-    - "Notices"
+  represent a "full work".
+  
+An example of a `stub` might be a paper that gets an extra DOI by accident; the
+primary DOI should be a full release, and the accidental DOI can be a `stub`
+release under the same work. `stub` releases shouldn't be considered full
+releases when counting or aggregating (though if technically difficult this may
+not always be implemented). Other things that can be categorized as stubs
+(which seem to often end up mis-categorized as full articles in bibliographic
+databases):
+
+- commercial advertisements
+- "trap" or "honey pot" works, which are fakes included in databases to
+  detect re-publishing without attribution
+- "This page is intentionally blank"
+- "About the author", "About the editors", "About the cover"
+- "Acknowledgments"
+- "Notices"
 
 All other CSL types are also allowed, though they are mostly out of scope:
 
 - `article` (generic; should usually be some other type)
+- `article-newspaper`
 - `bill`
 - `broadcast`
 - `entry-dictionary`
@@ -438,6 +445,20 @@ Can often be interpreted as `published`, but be careful!
 - `illustrator`
 - `editor`
 
+All other CSL role types are also allowed, though are mostly out of scope for
+Fatcat:
+
+- `collection-editor`
+- `composer`
+- `container-author`
+- `director`
+- `editorial-director`
+- `editortranslator`
+- `interviewer`
+- `original-author`
+- `recipient`
+- `reviewed-author`
+
 If blank, indicates that type of contribution is not known; this can often be
 interpreted as authorship.
 
diff --git a/guide/src/goals.md b/guide/src/goals.md
index e7ef1512..9bb64b62 100644
--- a/guide/src/goals.md
+++ b/guide/src/goals.md
@@ -1,14 +1,14 @@
 
 ## Project Goals and Ecosystem Niche
 
-The Internet Archive has two primary use cases for fatcat:
+The Internet Archive has two primary use cases for Fatcat:
 
 - Tracking the "completeness" of our holdings against all known published
   works.  In particular, allow us to monitor progress, identify gaps, and
   prioritize further collection work.
 - Be a public-facing catalog and access mechanism for our open access holdings.
 
-In the larger ecosystem, fatcat could also provide:
+In the larger ecosystem, Fatcat could also provide:
 
 - A work-level (as opposed to title-level) archival dashboard: what fraction of
   all published works are preserved in archives? [KBART](), [CLOCKSS](),
@@ -22,8 +22,8 @@ In the larger ecosystem, fatcat could also provide:
   reproducibility (metadata corpus itself is open access, and file-level hashes
   control for content drift)
 - Foundational infrastructure for distributed digital preservation
-- On-ramp for non-traditional digital works ("grey literature") into the
-  scholarly web
+- On-ramp for non-traditional digital works (web-native and "grey literature")
+  into the scholarly web
 
 [KBART]: https://thekeepers.org/
 [CLOCKSS]: https://clockss.org
@@ -35,22 +35,22 @@ What types of works should be included in the catalog?
 
 The goal is to capture the "scholarly web": the graph of written works that
 cite other works. Any work that is both cited more than once and cites more
-than one other work in the catalog is very likely to be in scope. "Leaf nodes"
-and small islands of intra-cited works may or may not be in scope.
-
-Fatcat does not include any fulltext content itself, even for cleanly licensed
-(open access) works, but does have "strong" (verified) links to fulltext
-content, and includes file-level metadata (like hashes and fingerprints)
-to help discovery and identify content from any source. File-level URLs with
-context ("repository", "author-homepage", "web-archive") should make fatcat
-more useful for both humans and machines to quickly access fulltext content of
-a given mimetype than existing redirect or landing page systems. So another
-factor in deciding scope is whether a work has "digital fixity" and can be
-contained in a single immutable file.
+than one other work in the catalog is likely to be in scope. "Leaf nodes" and
+small islands of intra-cited works may or may not be in scope.
+
+Fatcat does not include any fulltext content itself, even for clearly licensed
+open access works, but does have verified hyperlinks to fulltext content, and
+includes file-level metadata (hashes and fingerprints) to help identify content
+from any source. File-level URLs with context ("repository", "publisher",
+"webarchive") should make Fatcat more useful for both humans and machines to
+quickly access fulltext content of a given mimetype than existing redirect or
+landing page systems. So another factor in deciding scope is whether a work has
+"digital fixity" and can be contained in immutable files or can be captured by
+web archives.
 
 ## References and Previous Work
 
-The closest overall analog of fatcat is [MusicBrainz][mb], a collaboratively
+The closest overall analog of Fatcat is [MusicBrainz][mb], a collaboratively
 edited music database. [Open Library][ol] is a very similar existing service,
 which exclusively contains book metadata.
 
@@ -60,23 +60,23 @@ open bibliographic database at this time (early 2018), including the
 Wikidata is a general purpose semantic database of entities, facts, and
 relationships; bibliographic metadata has become a large fraction of all
 content in recent years. The focus there seems to be linking knowledge
-(statements) to specific sources unambiguously. Potential advantages fatcat has
+(statements) to specific sources unambiguously. Potential advantages Fatcat has
 are a focus on a specific scope (not a general-purpose database of entities)
 and a goal of completeness (capturing as many works and relationships as
 rapidly as possible). With so much overlap, the two efforts might merge in the
 future.
 
-The technical design of fatcat is loosely inspired by the git
+The technical design of Fatcat is loosely inspired by the git
 branch/tag/commit/tree architecture, and specifically inspired by Oliver
 Charles' "New Edit System" [blog posts][nes-blog] from 2012.
 
-There are a whole bunch of proprietary, for-profit bibliographic databases,
+There are a number of proprietary, for-profit bibliographic databases,
 including Web of Science, Google Scholar, Microsoft Academic Graph, aminer,
 Scopus, and Dimensions. There are excellent field-limited databases like dblp,
-MEDLINE, and Semantic Scholar. There are some large general-purpose databases
-that are not directly user-editable, including the OpenCitation corpus, CORE,
-BASE, and CrossRef. We do not know of any large (more than 60 million works),
-open (bulk-downloadable with permissive or no license), field agnostic,
+MEDLINE, and Semantic Scholar. Large, general-purpose databases also exist that
+are not directly user-editable, including the OpenCitation corpus, CORE, BASE,
+and CrossRef. We do not know of any large (more than 60 million works), open
+(bulk-downloadable with permissive or no license), field agnostic,
 user-editable corpus of scholarly publication bibliographic metadata.
 
 [nes-blog]: https://ocharles.org.uk/blog/posts/2012-07-10-nes-does-it-better-1.html
diff --git a/guide/src/http_api.md b/guide/src/http_api.md
index 5769533d..e1b7f557 100644
--- a/guide/src/http_api.md
+++ b/guide/src/http_api.md
@@ -1,6 +1,6 @@
 # REST API
 
-The fatcat HTTP API is mostly a classic REST CRUD (Create, Read, Update,
+The Fatcat HTTP API is mostly a classic REST "CRUD" (Create, Read, Update,
 Delete) API, with a few twists.
 
 A declarative specification of all API endpoints, JSON data models, and
@@ -9,9 +9,8 @@ used to generate both server-side type-safe endpoint routes and client-side
 libraries. Auto-generated reference documentation is, for now, available at
 <https://api.qa.fatcat.wiki>.
 
-All API traffic is over HTTPS; there is no insecure HTTP endpoint, even for
-read-only operations. To start, all endpoints accept and return only JSON
-serialized content.
+All API traffic is over HTTPS; there is no HTTP endpoint, even for read-only
+operations. All endpoints accept and return only JSON serialized content.
 
 ## Entity Endpoints/Actions
 
@@ -21,16 +20,13 @@ Actions could, in theory, be directed at any of:
     revision
     edit
 
-A design decision to be made is how much to abstract away the distinction
-between these three types (particularly the identifier/revision distinction).
-
 Top-level entity actions (resulting in edits):
 
     create (new rev)
-    redirect
-    split
     update (new rev)
     delete
+    redirect
+    split (remove redirect)
 
 On existing entity edits (within a group):
 
@@ -45,17 +41,23 @@ An edit group as a whole can be:
 
 Other per-entity endpoints:
 
-    match (by field/context)
     lookup (by external persistent identifier)
+    match (by field/context; unimplemented)
 
 ## Editgroups
 
-All mutating entity operations (create, update, delete) accept an
-`editgroup_id` query parameter. If the parameter isn't set, the editor's
-"currently active" editgroup will be used, or a new editgroup will be created
-from scratch. It's generally preferable to manually create an editgroup and use
-the `id` in edit requests; the allows appropriate metadata to be set. The
-"currently active" editgroup behavior may be removed in the future.
+All mutating entity operations (create, update, delete) accept a required
+`editgroup_id` query parameter. Editgroups (with contextual metadata) should be
+created before starting edits.
+
+Related edits (to multiple entities) should be collected under a single
+editgroup, up to a reasonable size. More than 50 edits per entity type, or more
+than 100 edits total in an editgroup become unwieldy.
+
+After creating and modifying the editgroup, it may be "submitted", which flags
+it for review by bot and human editors. The editgroup may be "accepted"
+(merged), or if changes are necessary the edits can be updated and
+re-submitted.
 
 ## Sub-Entity Expansion
 
@@ -77,9 +79,8 @@ editor may have additional privileges which allow them to, eg, directly accept
 editgroups (as opposed to submitting edits for review).
 
 All mutating API calls (POST, PUT, DELETE HTTP verbs) require token-based
-authentication using an HTTP Bearer token. If you can't generate such a token
-from the web interface (because that feature hasn't been implemented), look for
-a public demo token for experimentation, or ask an administrator for a token.
+authentication using an HTTP Bearer token. New tokens can be generated in the
+web interface.
 
 ## Autoaccept Flag
 
diff --git a/guide/src/implementation.md b/guide/src/implementation.md
index 33a53c21..8d1830b6 100644
--- a/guide/src/implementation.md
+++ b/guide/src/implementation.md
@@ -15,14 +15,14 @@ A cronjob will create periodic database dumps, both in "full" form (all tables
 and all edit history, removing only authentication credentials) and "flattened"
 form (with only the most recent version of each entity).
 
-A goal is to be linked-data/RDF/JSON-LD/semantic-web "compatible", but not
-necessarily "first". It should be possible to export the database in a
+One design goal is to be linked-data/RDF/JSON-LD/semantic-web "compatible", but
+not necessarily "first". It should be possible to export the database in a
 relatively clean RDF form, and to fetch data in a variety of formats, but
-internally fatcat will not be backed by a triple-store, and will not be bound
-to a rigid third-party ontology or schema.
+internally Fatcat is not backed by a triple-store, and is not tied to any
+specific third-party ontology or schema.
 
 Microservice daemons should be able to proxy between the primary API and
-standard protocols like ResourceSync and OAI-PMH, and third party bots could
+standard protocols like ResourceSync and OAI-PMH, and third party bots can
 ingest or synchronize the database in those formats.
 
 ### Fatcat Identifiers
diff --git a/guide/src/policies.md b/guide/src/policies.md
index e61984be..3816f876 100644
--- a/guide/src/policies.md
+++ b/guide/src/policies.md
@@ -69,11 +69,11 @@ and CC-0 (public grant) licensing for declarative interface specifications
 ## Privacy Policy
 
 *It is important to note that this section is currently aspirational: the
-servers hosting early deployments of fatcat are largely in a default
+servers hosting early deployments of Fatcat are largely in a defaults
 configuration and have not been audited to ensure that these guidelines are
 being followed.*
 
-It is a goal for fatcat to conduct as little surveillance of reader and editor
+It is a goal for Fatcat to conduct as little surveillance of reader and editor
 behavior and activities as possible. In practical terms, this means minimizing
 the overall amount of logging and collection of identifying information. This
 is in contrast to *submitted edit content*, which is captured, preserved, and
diff --git a/guide/src/roadmap.md b/guide/src/roadmap.md
index 745380f9..c4cc6a98 100644
--- a/guide/src/roadmap.md
+++ b/guide/src/roadmap.md
@@ -1,20 +1,11 @@
 # Roadmap
 
-Major unimplemented features (as of September 2018) include:
+Core unimplemented features (as of February 2019) include:
 
-- backend "soundness" work to ensure corrupt data model states aren't reachable
-  via the API
-- authentication and account creation
 - rate-limiting and spam/abuse mitigation
-- "automated update" bots to consume metadata feeds (as opposed to one-time
-  bulk imports)
 - actual entity creation, editing, deleting through the web interface
-- updating the search index in near-real-time following editgroup merges. In
-  particular, the cache invalidation problem is tricky for some relationships
-  (eg, updating all releases if a container is updated)
 
-Once a reasonable degree of schema and API stability is attained, contributions
-would be helpful to implement:
+Contributions would be helpful to implement:
 
 - import (bulk and/or continuous updates) for more metadata sources
 - better handling of work/release distinction in, eg, search results and
@@ -23,23 +14,19 @@ would be helpful to implement:
 - matching improvements, eg, for references (citations), contributions
   (authorship), work grouping, and file/release matching
 - internationalization of the web interface (translation to multiple languages)
-- review of design for accessibility
-- better handling of non-PDF file formats
+- accessibility review of user interface
 
 Longer term projects could include:
 
 - full-text search over release files
 - bi-directional synchronization with other user-editable catalogs, such as
   Wikidata
-- better representation of multi-file objects such as websites and datasets
 - alternate/enhanced backend to store full edit history without overloading
   traditional relational database
 
 ## Known Issues
 
-Too many right now, but this section will be populated soon.
-
-- changelog index may have gaps due to postgresql sequence and transaction
+- changelog index may have gaps due to PostgreSQL sequence and transaction
   roll-back behavior
 
 ## Unresolved Questions
@@ -48,22 +35,19 @@ How to handle translations of, eg, titles and author names? To be clear, not
 translations of works (which are just separate releases), these are more like
 aliases or "originally known as".
 
-Are bi-directional links a schema anti-pattern? Eg, should "work" point to a
-"primary release" (which itself points back to the work)?
-
-Should `identifier` and `citation` be their own entities, referencing other
-entities by UUID instead of by revision? Not sure if this would increase or
-decrease database resource utilization.
+Should external identifers be made generic? Eg, instead of having `arxiv_id` as
+a column, have a table of arbitary identifers, with either an `extid_type` or
+just use a prefix like `arxiv:someid`.
 
 Should contributor/author affiliation and contact information be retained? It
 could be very useful for disambiguation, but we don't want to build a huge
-database for spammers or "innovative" start-up marketing.
+database for "marketing" and other spam.
 
 Can general-purpose SQL databases like Postgres or MySQL scale well enough to
 hold several tables with billions of entity revisions? Right from the start
 there are hundreds of millions of works and releases, many of which having
 dozens of citations, many authors, and many identifiers, and then we'll have
-potentially dozens of edits for each of these, which multiply out to `1e8 * 2e1
+potentially dozens of edits for each of these. This multiplies out to `1e8 * 2e1
 * 2e1 = 4e10`, or 40 billion rows in the citation table. If each row was 32
 bytes on average (uncompressed, not including index size), that would be 1.3
 TByte on its own, larger than common SSD disks. I do think a transactional SQL
@@ -74,7 +58,7 @@ primary database, as user interfaces could rely on secondary read-only search
 engines for more complex queries and views.
 
 There is a tension between focus and scope creep. If a central database like
-fatcat doesn't support enough fields and metadata, then it will not be possible
+Fatcat doesn't support enough fields and metadata, then it will not be possible
 to completely import other corpuses, and this becomes "yet another" partial
 bibliographic database. On the other hand, accepting arbitrary data leads to
 other problems: sparseness increases (we have more "partial" data), potential
diff --git a/guide/src/scope.md b/guide/src/scope.md
index d5e74156..9815c44e 100644
--- a/guide/src/scope.md
+++ b/guide/src/scope.md
@@ -53,11 +53,11 @@ pre-prints to final publication is in scope.
 I'm much less interested in altmetrics, funding, and grant relationships than
 most existing databases in this space.
 
-fatcat would not include any fulltext content itself, even for cleanly licensed
+Fatcat would not include any fulltext content itself, even for cleanly licensed
 (open access) works, but would have "strong" (verified) links to fulltext
 content, and would include file-level metadata (like hashes and fingerprints)
 to help discovery and identify content from any source. File-level URLs with
-context ("repository", "author-homepage", "web-archive") should make fatcat
+context ("repository", "author-homepage", "web-archive") should make Fatcat
 more useful for both humans and machines to quickly access fulltext content of
 a given mimetype than existing redirect or landing page systems. So another
 factor in deciding scope is whether a work has "digital fixity" and can be
diff --git a/guide/src/style_guide.md b/guide/src/style_guide.md
index 7f819c8d..d670691a 100644
--- a/guide/src/style_guide.md
+++ b/guide/src/style_guide.md
@@ -19,12 +19,12 @@ treated as an entirely separate `release`.
 documentation (such as DOI `10.5555/12345678`) are allowed (and the entity
 should be tagged as a fake or example). Non-registered "identifier-like
 strings", which are semantically valid but not registered, should not exist in
-fatcat metadata in an identifier column. Invalid identifier strings can be
+Fatcat metadata in an identifier column. Invalid identifier strings can be
 stored in "extra" metadata. Crossref has [blogged]() about this distinction.
 
 [blogged]: https://www.crossref.org/blog/doi-like-strings-and-fake-dois/
 
-#### DOI
+#### DOIs
 
 All DOIs stored in an entity column should be registered (aka, should be
 resolvable from `doi.org`). Invalid identifiers may be cleaned up or removed by
@@ -38,9 +38,9 @@ formatted strings.
 
 [number of examples]: https://www.crossref.org/blog/dois-unambiguously-and-persistently-identify-published-trustworthy-citable-online-scholarly-literature-right/
 
-In the fatcat ontology, DOIs and release entities are one-to-one.
+In the Fatcat ontology, DOIs and release entities are one-to-one.
 
-It is the intention to automatically (via bot) create a fatcat release for
+It is the intention to automatically (via bot) create a Fatcat release for
 every Crossref-registered DOI from a whitelist of media types
 ("journal-article" etc, but not all), and it would be desirable to auto-create
 entities for in-scope publications from all registrars. It is not the intention
diff --git a/guide/src/sw_contribute.md b/guide/src/sw_contribute.md
index 999b2149..d408ef4b 100644
--- a/guide/src/sw_contribute.md
+++ b/guide/src/sw_contribute.md
@@ -2,13 +2,13 @@
 
 For now, issues and patches can be filed at <https://github.com/internetarchive/fatcat>.
 
-To start, the back-end (fatcatd, in rust), web interface (fatcat-web, in
-python), bots, and this guide are all versioned in the same git repository.
+The back-end (`fatcatd`, in Rust), web interface (`fatcat-web`, in Python),
+bots, and this guide are all versioned in the same git repository.
 
-See the `rust/README` and `rust/HACKING` documents for some common tasks and
-gotchas when working with the rust backend.
+See the `rust/README.md` and `rust/HACKING.md` documents for some common tasks
+and gotchas when working with the rust backend.
 
 When considering making a non-trivial contribution, it can save review time and
 duplicated work to post an issue with your intentions and plan. New code and
-features will need to include unit tests before being merged, though we can
-help with writing them.
+features must include unit tests before being merged, though we can help with
+writing them.
diff --git a/guide/src/welcome.md b/guide/src/welcome.md
index 0bdf36fa..b0d8b1cc 100644
--- a/guide/src/welcome.md
+++ b/guide/src/welcome.md
@@ -2,7 +2,7 @@
 
 This guide you are reading contains:
 
-- a **[high-level introduction](./overview.md)** to the fatcat catalog and
+- a **[high-level introduction](./overview.md)** to the Fatcat catalog and
   software
 - a bibliographic **[style guide](./style_guide.md)** for editors, also useful
   for understanding metadata found in the catalog
@@ -20,7 +20,7 @@ articles, pre-prints, and conference proceedings. Records are collaboratively
 editable, versioned, available in bulk form, and include URL-agnostic
 file-level metadata.
 
-Both the fatcat software and the metadata stored in the service are free (in
+Both the Fatcat software and the metadata stored in the service are free (in
 both the libre and gratis sense) for others to share, reuse, fork, or extend.
 See [Policies](./policies.md) for licensing details, and
 [Sources](./sources.md) for attribution of the foundational metadata corpuses
diff --git a/guide/src/workflow.md b/guide/src/workflow.md
index 94842e54..ff1552cf 100644
--- a/guide/src/workflow.md
+++ b/guide/src/workflow.md
@@ -3,8 +3,8 @@
 ## Basic Editing Workflow and Bots
 
 Both human editors and bots should have edits go through the same API, with
-humans using either the default web interface, integration, or client
-software.
+humans using either the default web interface, client software, or third-party
+integrations.
 
 The normal workflow is to create edits (or updates, merges, deletions) on
 individual entities. Individual changes are bundled into an "edit group" of
@@ -12,13 +12,13 @@ related edits (eg, correcting authorship info for multiple works related to a
 single author). When ready, the editor "submits" the edit group for
 review. During the review period, human editors vote and bots can perform
 automated checks. During this period the editor can make tweaks if necessary.
-After some fixed time period (72 hours?) with no changes and no blocking
-issues, the edit group would be auto-accepted if no merge conflicts have
-be created by other edits to the same entities. This process balances editing
-labor (reviews are easy, but optional) against quality (cool-down period makes
-it easier to detect and prevent spam or out-of-control bots). More
-sophisticated roles and permissions could allow some certain humans and bots to
-push through edits more rapidly (eg, importing new works from a publisher API).
+After some fixed time period (one week?) with no changes and no blocking
+issues, the edit group would be accepted if no merge conflicts have be created
+by other edits to the same entities. This process balances editing labor
+(reviews are easy, but optional) against quality (cool-down period makes it
+easier to detect and prevent spam or out-of-control bots). More sophisticated
+roles and permissions could allow some certain humans and bots to push through
+edits more rapidly (eg, importing new works from a publisher API).
 
 Bots need to be tuned to have appropriate edit group sizes (eg, daily batches,
 instead of millions of works in a single edit) to make human QA review and
author	Bryan Newbold <bnewbold@robocracy.org>	2019-02-14 16:19:26 -0800
committer	Bryan Newbold <bnewbold@robocracy.org>	2019-02-14 16:19:26 -0800
commit	70b4bc18b13f59c9d42c8e44ef872dfd2e1abef3 (patch)
tree	1c4706394047bce6a086228e2efe8632d8bc1a23 /guide/src
parent	56edebe7c2e090c4f25179f03a2d77d78ba59219 (diff)
download	fatcat-70b4bc18b13f59c9d42c8e44ef872dfd2e1abef3.tar.gz fatcat-70b4bc18b13f59c9d42c8e44ef872dfd2e1abef3.zip