diff options
Diffstat (limited to 'guide/src')
| -rw-r--r-- | guide/src/alignments.md | 2 | ||||
| -rw-r--r-- | guide/src/bulk_exports.md | 2 | ||||
| -rw-r--r-- | guide/src/data_model.md | 26 | ||||
| -rw-r--r-- | guide/src/entity_fields.md | 16 | ||||
| -rw-r--r-- | guide/src/entity_types.md | 2 | ||||
| -rw-r--r-- | guide/src/goals.md | 2 | ||||
| -rw-r--r-- | guide/src/implementation.md | 4 | ||||
| -rw-r--r-- | guide/src/overview.md | 2 | ||||
| -rw-r--r-- | guide/src/policies.md | 12 | ||||
| -rw-r--r-- | guide/src/roadmap.md | 2 | ||||
| -rw-r--r-- | guide/src/sources.md | 2 | ||||
| -rw-r--r-- | guide/src/style_guide.md | 20 | ||||
| -rw-r--r-- | guide/src/workflow.md | 6 | 
13 files changed, 49 insertions, 49 deletions
| diff --git a/guide/src/alignments.md b/guide/src/alignments.md index 291dd6e5..783122ea 100644 --- a/guide/src/alignments.md +++ b/guide/src/alignments.md @@ -2,7 +2,7 @@  A table (CSV) of "alignments" between fatcat entity types and fields with other  file formats and standards is available under the `./notes/` directory of the -source repo. +source git repository..  TODO: in particular, highlight alignments with: diff --git a/guide/src/bulk_exports.md b/guide/src/bulk_exports.md index 21cb8226..3a9badcb 100644 --- a/guide/src/bulk_exports.md +++ b/guide/src/bulk_exports.md @@ -11,7 +11,7 @@ interested in:    in small tables ("partial transform") and export JSON for each table; would    be extra work to maintain, so not pursuing for now.  - full history, full public schema exports, in a form that might be used to -  mirror or enitrely fork the project. Propose supplying the full "changelog" +  mirror or entirely fork the project. Propose supplying the full "changelog"    in API schema format, in a single file to capture all entity history, without    "hydrating" any inter-entity references. Rely on separate dumps of    non-entity, non-versioned tables (editors, abstracts, etc). Note that a diff --git a/guide/src/data_model.md b/guide/src/data_model.md index f3b9b35a..2d6f7287 100644 --- a/guide/src/data_model.md +++ b/guide/src/data_model.md @@ -14,13 +14,13 @@ artifacts) over physical items, the primary bibliographic entity types are:    `work`.  - `release`: a specific "release" or "publicly published" (in a formal or    informal sense) version of a work. Contains traditional bibliographic -  metadata (title, date of publiction, media type, language, etc). Has +  metadata (title, date of publication, media type, language, etc). Has    relationships to other entities:      - "variant of" a single `work`      - "contributed to by" multiple `creators`      - "references to" (cites) multiple `releases`      - "published as part of" a single `container` -- `file`: a single concrete, fixed ditigal artifact; a manifestation of one or +- `file`: a single concrete, fixed digital artifact; a manifestation of one or    more `releases`. Machine-verifiable metadata includes file hashes, size, and    detected file format. Verified URLs link to locations on the open web where    this file can be found or has been archived. Has relationships: @@ -49,9 +49,9 @@ the same regardless of type.  A specific version of any entity in the catalog is called a "revision".  Revisions are generally immutable (do not change and are not editable), and are -not usually refered to directly by users. Instead, persistent identifiers can -be created, which "point to" a specific revsiion at a time. This distinction -means that entities refered to by an identifier can change over time (as +not usually referred to directly by users. Instead, persistent identifiers can +be created, which "point to" a specific revision at a time. This distinction +means that entities referred to by an identifier can change over time (as  metadata is corrected and expanded). Revision objects do not "point" back to  specific identifiers, so they are not the same as a simple "version number" for  an identifier. @@ -63,7 +63,7 @@ be fetched and inspected on a per-identifier basis, and any changes can easily  be reverted (even merges/redirects and "deletion").  "Staged" or "proposed" changes are captured as edit objects without updating -the identifers themselves. +the identifiers themselves.  ### Fatcat Identifiers @@ -83,7 +83,7 @@ In comparison, 96-bit identifiers would have 20 characters and look like:      work_rzga5b9cd7efgh04iljk      https://fatcat.wiki/work/rzga5b9cd7efgh04iljk -A 64-bit namespace would probably be large enought, and would work with +A 64-bit namespace would probably be large enough, and would work with  database Integer columns:      work_rzga5b9cd7efg @@ -102,7 +102,7 @@ Revisions are stored in their complete form, not as a patch or difference; if  comparing to distributed version control systems (for managing changes to  source code), this follows the git model, not the mercurial model. -The entity revisions are immutable once accepted; the editting process involves +The entity revisions are immutable once accepted; the editing process involves  the creation of new entity revisions and, if the edit is approved, pointing the  identifier to the new revision. Entities cross-reference between themselves by  *identifier* not *revision number*. Identifier pointers also support @@ -122,7 +122,7 @@ SQL tables look something like this (with separate tables for entity type a la      entity_revision          revision_id -        <all entity-tyle-specific fields> +        <all entity-style-specific fields>          extra: json blob for schema evolution      entity_edit @@ -140,7 +140,7 @@ SQL tables look something like this (with separate tables for entity type a la          extra: json blob for progeny metadata  An individual entity can be in the following "states", from which the given -actions (transistion) can be made: +actions (transition) can be made:  - `wip` (not live; not redirect; has rev)      - activate (to `active`) @@ -166,7 +166,7 @@ history of an object.  ## Controlled Vocabularies  -Some individual fields have additional contraints, either in the form of +Some individual fields have additional constraints, either in the form of  pattern validation ("values must be upper case, contain only certain  characters"), or membership in a fixed set of values. These may include: @@ -175,7 +175,7 @@ characters"), or membership in a fixed set of values. These may include:  - work "types" (article vs. book chapter vs. proceeding, etc)  - contributor types (author, translator, illustrator, etc)  - human languages -- identifier namespaces (DOI, ISBN, ISSN, ORCID, etc; but not the identifers +- identifier namespaces (DOI, ISBN, ISSN, ORCID, etc; but not the identifiers    themselves)  Other fixed-set "vocabularies" become too large to easily maintain or express @@ -183,7 +183,7 @@ in code. These could be added to the backend databases, or be enforced by bots  (instead of the core system itself). These mostly include externally-registered identifiers or types, such as:  - file mimetypes -- identifiers themselves (DOI, ORCID, etc), by checking for registeration +- identifiers themselves (DOI, ORCID, etc), by checking for registration    against canonical APIs and databases  ## Global Edit Changelog diff --git a/guide/src/entity_fields.md b/guide/src/entity_fields.md index 4f14577f..f8fcf082 100644 --- a/guide/src/entity_fields.md +++ b/guide/src/entity_fields.md @@ -5,8 +5,8 @@ All entities have:  - `extra`: free-form JSON metadata  The "extra" field is an "escape hatch" to include extra fields not in the -regular schema. It is intented to enable gradual evolution of the schema, as -well as accomodating niche or field-specific content. That being said, +regular schema. It is intended to enable gradual evolution of the schema, as +well as accommodating niche or field-specific content. That being said,  reasonable limits should be adhered to.  ## Containers @@ -88,7 +88,7 @@ guide.    publicly available  - `doi` (string): full DOI number, lower-case. Example: "10.1234/abcde.789".    See the "External Identifiers" section of style guide. -- `isbn13` (string): external identifer for books. ISBN-9 and other formats +- `isbn13` (string): external identifier for books. ISBN-9 and other formats    should be converted to canonical ISBN-13. See the "External Identifiers"    section of style guide.  - `core_id` (string): external identifier for the [CORE] open access @@ -144,7 +144,7 @@ guide.        affiliation, "this is the corresponding author", etc.  - `refs`: an array of references (aka, citations) to other releases. References    can only be linked to a specific target release (not a work), though it may -  be ambugious which release of a work is being referenced if the citation is +  be ambiguous which release of a work is being referenced if the citation is    not specific enough. Reference fields include:      - `index` (integer, optional): reference lists and bibliographies almost        always have an implicit order. Zero-indexed. Note that this is distinct @@ -154,10 +154,10 @@ guide.      - `extra` (JSON, optional): additional citation format metadata can be        stored here, particularly if the citation schema does not align. Common        fields might be "volume", "authors", "issue", "publisher", "url", and -      external identifers ("doi", "isbn13"). +      external identifiers ("doi", "isbn13").      - `key` (string): works often reference works with a short slug or index        number, which can be captured here. For example, "[BROWN2017]". Keys -      generally supercede the `index` field, though both can/should be +      generally supersede the `index` field, though both can/should be        supplied.      - `year` (integer): year of publication of the cited release.      - `container_title` (string): if applicable, the name of the container of @@ -215,7 +215,7 @@ vocabulary (TODO: should it follow [CSL types](http://docs.citationstyles.org/en    can be a `stub` release under the same work. `stub` releases shouldn't be    considered full releases when counting or aggregating (though if technically    difficult this may not always be implemented). Other things that can be -  categorized as stubs (which seem to often end up miscategorized as full +  categorized as stubs (which seem to often end up mis-categorized as full    articles in bibliographic databases):      - an abstract, which is only an abstract of a larger work      - commercial advertisements @@ -223,7 +223,7 @@ vocabulary (TODO: should it follow [CSL types](http://docs.citationstyles.org/en        detect re-publishing without attribution      - "This page is intentionally blank"      - "About the author", "About the editors", "About the cover" -    - "Acknowledgements" +    - "Acknowledgments"      - "Notices"  Other types from Crossref (such as `component`, `reference-entry`) are valid, diff --git a/guide/src/entity_types.md b/guide/src/entity_types.md index 1a74f79e..489a62e8 100644 --- a/guide/src/entity_types.md +++ b/guide/src/entity_types.md @@ -4,4 +4,4 @@ TODO: entity-type-specific scope and quality guidance  ## Work/Release/File Distinctions -TODO: clarify distinctions and relationship between theese three entity types +TODO: clarify distinctions and relationship between these three entity types diff --git a/guide/src/goals.md b/guide/src/goals.md index 048d9cb1..e7ef1512 100644 --- a/guide/src/goals.md +++ b/guide/src/goals.md @@ -12,7 +12,7 @@ In the larger ecosystem, fatcat could also provide:  - A work-level (as opposed to title-level) archival dashboard: what fraction of    all published works are preserved in archives? [KBART](), [CLOCKSS](), -  [Portico](), and other preservations don't provide granular metadata +  [Portico](), and other preservation networks don't provide granular metadata  - A collaborative, independent, non-commercial, fully-open, field-agnostic,    "completeness"-oriented catalog of scholarly metadata  - Unified (centralized) foundation for discovery and access across repositories diff --git a/guide/src/implementation.md b/guide/src/implementation.md index df8d66b9..66ae7f6b 100644 --- a/guide/src/implementation.md +++ b/guide/src/implementation.md @@ -11,7 +11,7 @@ operated by anybody. A separate web interface project talks to the API backend  and can be developed more rapidly with less concern about data loss or  corruption. -A cronjob will creae periodic database dumps, both in "full" form (all tables +A cronjob will create periodic database dumps, both in "full" form (all tables  and all edit history, removing only authentication credentials) and "flattened"  form (with only the most recent version of each entity). @@ -23,4 +23,4 @@ to a rigid third-party ontology or schema.  Microservice daemons should be able to proxy between the primary API and  standard protocols like ResourceSync and OAI-PMH, and third party bots could -ingest or synchronize the databse in those formats. +ingest or synchronize the database in those formats. diff --git a/guide/src/overview.md b/guide/src/overview.md index 58107429..68171905 100644 --- a/guide/src/overview.md +++ b/guide/src/overview.md @@ -6,7 +6,7 @@ This section gives an introduction to:  - the goals of the project, and now it relates to the rest of the Open Access    and archival ecosystem  - how catalog data is represented as entities and revisions with full edit -  history, and how entities are refered to and cross-referenced with +  history, and how entities are referred to and cross-referenced with    identifiers  - how humans and bots propose changes to the catalog, and how these changes are    reviewed diff --git a/guide/src/policies.md b/guide/src/policies.md index 18d84a36..03e5e526 100644 --- a/guide/src/policies.md +++ b/guide/src/policies.md @@ -42,14 +42,14 @@ history, including all of their contributions.  ## Immutable History  All editors agree to the licensing terms, and understand that their full public -history of contributions is made irrevokably public. Edits and contributions +history of contributions is made irrevocably public. Edits and contributions  may be *reverted*, but the history (and content) of their edits are retained.  Edit history is not removed from the corpus on the request of an editor or when  an editor closes their account.  In an emergency situation, such as non-bibliographic content getting encoded in  the corpus by bypassing normal filters (eg, base64 encoding hate crime content -or exploitive photos, as has happened to some blockchain projects), the +or exploitative photos, as has happened to some blockchain projects), the  ecosystem may decide to collectively, in a coordinated manner, expunge specific  records from their history. @@ -73,8 +73,8 @@ servers hosting early deployments of fatcat are largely in a default  configuration and have not been audited to ensure that these guidelines are  being followed.* -It is a goal for fatcat to conduct as little surveillence of reader and editor -bahavior and activities as possible. In pratical terms, this means minimizing +It is a goal for fatcat to conduct as little surveillance of reader and editor +behavior and activities as possible. In practical terms, this means minimizing  the overall amount of logging and collection of identifying information. This  is in contrast to *submitted edit content*, which is captured, preserved, and  republished as widely as possible. @@ -94,8 +94,8 @@ Exceptions will likely be made:  Some uncertain areas of privacy include: -- should third-party authenticion identities be linked to editor ids? what -  about the specific case of ORCiDs if used for login? +- should third-party authentication identities be linked to editor ids? what +  about the specific case of ORCID if used for login?  - what about discussion and comments on edits? should conversations be included    in full history dumps? should editors be allowed to update or remove    comments? diff --git a/guide/src/roadmap.md b/guide/src/roadmap.md index 1a2def31..745380f9 100644 --- a/guide/src/roadmap.md +++ b/guide/src/roadmap.md @@ -32,7 +32,7 @@ Longer term projects could include:  - bi-directional synchronization with other user-editable catalogs, such as    Wikidata  - better representation of multi-file objects such as websites and datasets -- altenate/enhanced backend to store full edit history without overloading +- alternate/enhanced backend to store full edit history without overloading    traditional relational database  ## Known Issues diff --git a/guide/src/sources.md b/guide/src/sources.md index b8853d8a..5b3d9d3e 100644 --- a/guide/src/sources.md +++ b/guide/src/sources.md @@ -24,6 +24,6 @@ institution-specific catalogs.  Progeny information (where the metadata comes from, or who "makes specific  claims") is stored in edit metadata in the data model. Value-level attribution -cna be achived by looking at the full edit history for an entity as a series of +can be achieved by looking at the full edit history for an entity as a series of  patches. diff --git a/guide/src/style_guide.md b/guide/src/style_guide.md index 35d13e97..7f819c8d 100644 --- a/guide/src/style_guide.md +++ b/guide/src/style_guide.md @@ -7,7 +7,7 @@ entity, or even a "native"/"international" representation as seems common in  other bibliographic systems. This most notably applies to release titles, but  also to container and publisher names, and likely other fields. -For now, editors must use their own judgement over whether to use the title of +For now, editors must use their own judgment over whether to use the title of  the release listed in the work itself  This is not to be confused with *translations* of entire works, which should be @@ -30,9 +30,9 @@ All DOIs stored in an entity column should be registered (aka, should be  resolvable from `doi.org`). Invalid identifiers may be cleaned up or removed by  bots. -DOIs should *always* be stored and transfered in lower-case form. Note that +DOIs should *always* be stored and transferred in lower-case form. Note that  there are almost no other constraints on DOIs (and handles in general): they -may have muliple forward slashes, whitespace, of arbitrary length, etc. +may have multiple forward slashes, whitespace, of arbitrary length, etc.  Crossref has a [number of examples]() of such "valid" but frustratingly  formatted strings. @@ -60,28 +60,28 @@ background reading, see:  Particular difficult issues in the context of a bibliographic database include  the non-universal concept of "family" vs.  "given" names and their relationship -to first and last names; the inclusion of honarary titles and other suffixes -and prefixes to a name; the distinction between "prefered", "legal", and +to first and last names; the inclusion of honorary titles and other suffixes +and prefixes to a name; the distinction between "preferred", "legal", and  "bibliographic" names, or other situations where a person may not wish to be -known under the name they are commonly refered to under; language and character +known under the name they are commonly referred to under; language and character  set issues; and pseudonyms, anonymous publications, and fake personas (perhaps  representing a group, like Bourbaki).  The general guidance for Fatcat is to: -- not be a "source of truth" for representing a persona or human being; ORCiD +- not be a "source of truth" for representing a persona or human being; ORCID    and Wikidata are better suited to this task  - represent author personas, not necessarily 1-to-1 with human beings  - prioritize the concerns of a reader or researcher over that of the author  - enable basic interoperability with external databases, file formats, schemas, -  and style gudies +  and style guides  - when possible, respect the wishes of individuals  The data model for the `creator` entity has three name fields:  - `surname` and `given_name`: needed for "aligning" with external databases,    and to export metadata to many standard formats -- `display_name`: the "prefered" representation for display of the entire name, +- `display_name`: the "preferred" representation for display of the entire name,    in the context of international attribution of authorship of a written work  Names to not necessarily need to expressed in a Latin character set, but also @@ -101,7 +101,7 @@ of a reasonable size for review and acceptance. For example, merging two  `creators` and updating related `releases` could all go in a single editgroup.  Large refactors, conversions, and imports, which may touch thousands of  entities, should be grouped into reasonable size editgroups; extremely large -editgroups may cause technical issues, and make review unmanagable. 50 edits is +editgroups may cause technical issues, and make review unmanageable. 50 edits is  a decent batch size, and 100 is a good upper limit (and may be enforced by the  server). diff --git a/guide/src/workflow.md b/guide/src/workflow.md index fd53f6a9..996fb24c 100644 --- a/guide/src/workflow.md +++ b/guide/src/workflow.md @@ -3,7 +3,7 @@  ## Basic Editing Workflow and Bots  Both human editors and bots should have edits go through the same API, with -humans using either the default web interface, integrations, or client +humans using either the default web interface, integration, or client  software.  The normal workflow is to create edits (or updates, merges, deletions) on @@ -22,7 +22,7 @@ push through edits more rapidly (eg, importing new works from a publisher API).  Bots need to be tuned to have appropriate edit group sizes (eg, daily batches,  instead of millions of works in a single edit) to make human QA review and -reverts managable. +reverts manageable.  Data progeny and source references are captured in the edit metadata, instead  of being encoded in the entity data model itself. In the case of importing @@ -33,5 +33,5 @@ Human editors can leave edit messages to clarify their sources.  A [style guide](./style_guide.md) and discussion forum are intended to be be  hosted as separate stand-alone services for editors to propose projects and  debate process or scope changes. These services should have unified accounts -and logins (oauth?) for consistent account IDs across all services. +and logins (OAuth?) for consistent account IDs across all services. | 
