diff options
Diffstat (limited to 'guide')
-rw-r--r-- | guide/src/alignments.md | 2 | ||||
-rw-r--r-- | guide/src/bulk_exports.md | 2 | ||||
-rw-r--r-- | guide/src/data_model.md | 26 | ||||
-rw-r--r-- | guide/src/entity_fields.md | 16 | ||||
-rw-r--r-- | guide/src/entity_types.md | 2 | ||||
-rw-r--r-- | guide/src/goals.md | 2 | ||||
-rw-r--r-- | guide/src/implementation.md | 4 | ||||
-rw-r--r-- | guide/src/overview.md | 2 | ||||
-rw-r--r-- | guide/src/policies.md | 12 | ||||
-rw-r--r-- | guide/src/roadmap.md | 2 | ||||
-rw-r--r-- | guide/src/sources.md | 2 | ||||
-rw-r--r-- | guide/src/style_guide.md | 20 | ||||
-rw-r--r-- | guide/src/workflow.md | 6 |
13 files changed, 49 insertions, 49 deletions
diff --git a/guide/src/alignments.md b/guide/src/alignments.md index 291dd6e5..783122ea 100644 --- a/guide/src/alignments.md +++ b/guide/src/alignments.md @@ -2,7 +2,7 @@ A table (CSV) of "alignments" between fatcat entity types and fields with other file formats and standards is available under the `./notes/` directory of the -source repo. +source git repository.. TODO: in particular, highlight alignments with: diff --git a/guide/src/bulk_exports.md b/guide/src/bulk_exports.md index 21cb8226..3a9badcb 100644 --- a/guide/src/bulk_exports.md +++ b/guide/src/bulk_exports.md @@ -11,7 +11,7 @@ interested in: in small tables ("partial transform") and export JSON for each table; would be extra work to maintain, so not pursuing for now. - full history, full public schema exports, in a form that might be used to - mirror or enitrely fork the project. Propose supplying the full "changelog" + mirror or entirely fork the project. Propose supplying the full "changelog" in API schema format, in a single file to capture all entity history, without "hydrating" any inter-entity references. Rely on separate dumps of non-entity, non-versioned tables (editors, abstracts, etc). Note that a diff --git a/guide/src/data_model.md b/guide/src/data_model.md index f3b9b35a..2d6f7287 100644 --- a/guide/src/data_model.md +++ b/guide/src/data_model.md @@ -14,13 +14,13 @@ artifacts) over physical items, the primary bibliographic entity types are: `work`. - `release`: a specific "release" or "publicly published" (in a formal or informal sense) version of a work. Contains traditional bibliographic - metadata (title, date of publiction, media type, language, etc). Has + metadata (title, date of publication, media type, language, etc). Has relationships to other entities: - "variant of" a single `work` - "contributed to by" multiple `creators` - "references to" (cites) multiple `releases` - "published as part of" a single `container` -- `file`: a single concrete, fixed ditigal artifact; a manifestation of one or +- `file`: a single concrete, fixed digital artifact; a manifestation of one or more `releases`. Machine-verifiable metadata includes file hashes, size, and detected file format. Verified URLs link to locations on the open web where this file can be found or has been archived. Has relationships: @@ -49,9 +49,9 @@ the same regardless of type. A specific version of any entity in the catalog is called a "revision". Revisions are generally immutable (do not change and are not editable), and are -not usually refered to directly by users. Instead, persistent identifiers can -be created, which "point to" a specific revsiion at a time. This distinction -means that entities refered to by an identifier can change over time (as +not usually referred to directly by users. Instead, persistent identifiers can +be created, which "point to" a specific revision at a time. This distinction +means that entities referred to by an identifier can change over time (as metadata is corrected and expanded). Revision objects do not "point" back to specific identifiers, so they are not the same as a simple "version number" for an identifier. @@ -63,7 +63,7 @@ be fetched and inspected on a per-identifier basis, and any changes can easily be reverted (even merges/redirects and "deletion"). "Staged" or "proposed" changes are captured as edit objects without updating -the identifers themselves. +the identifiers themselves. ### Fatcat Identifiers @@ -83,7 +83,7 @@ In comparison, 96-bit identifiers would have 20 characters and look like: work_rzga5b9cd7efgh04iljk https://fatcat.wiki/work/rzga5b9cd7efgh04iljk -A 64-bit namespace would probably be large enought, and would work with +A 64-bit namespace would probably be large enough, and would work with database Integer columns: work_rzga5b9cd7efg @@ -102,7 +102,7 @@ Revisions are stored in their complete form, not as a patch or difference; if comparing to distributed version control systems (for managing changes to source code), this follows the git model, not the mercurial model. -The entity revisions are immutable once accepted; the editting process involves +The entity revisions are immutable once accepted; the editing process involves the creation of new entity revisions and, if the edit is approved, pointing the identifier to the new revision. Entities cross-reference between themselves by *identifier* not *revision number*. Identifier pointers also support @@ -122,7 +122,7 @@ SQL tables look something like this (with separate tables for entity type a la entity_revision revision_id - <all entity-tyle-specific fields> + <all entity-style-specific fields> extra: json blob for schema evolution entity_edit @@ -140,7 +140,7 @@ SQL tables look something like this (with separate tables for entity type a la extra: json blob for progeny metadata An individual entity can be in the following "states", from which the given -actions (transistion) can be made: +actions (transition) can be made: - `wip` (not live; not redirect; has rev) - activate (to `active`) @@ -166,7 +166,7 @@ history of an object. ## Controlled Vocabularies -Some individual fields have additional contraints, either in the form of +Some individual fields have additional constraints, either in the form of pattern validation ("values must be upper case, contain only certain characters"), or membership in a fixed set of values. These may include: @@ -175,7 +175,7 @@ characters"), or membership in a fixed set of values. These may include: - work "types" (article vs. book chapter vs. proceeding, etc) - contributor types (author, translator, illustrator, etc) - human languages -- identifier namespaces (DOI, ISBN, ISSN, ORCID, etc; but not the identifers +- identifier namespaces (DOI, ISBN, ISSN, ORCID, etc; but not the identifiers themselves) Other fixed-set "vocabularies" become too large to easily maintain or express @@ -183,7 +183,7 @@ in code. These could be added to the backend databases, or be enforced by bots (instead of the core system itself). These mostly include externally-registered identifiers or types, such as: - file mimetypes -- identifiers themselves (DOI, ORCID, etc), by checking for registeration +- identifiers themselves (DOI, ORCID, etc), by checking for registration against canonical APIs and databases ## Global Edit Changelog diff --git a/guide/src/entity_fields.md b/guide/src/entity_fields.md index 4f14577f..f8fcf082 100644 --- a/guide/src/entity_fields.md +++ b/guide/src/entity_fields.md @@ -5,8 +5,8 @@ All entities have: - `extra`: free-form JSON metadata The "extra" field is an "escape hatch" to include extra fields not in the -regular schema. It is intented to enable gradual evolution of the schema, as -well as accomodating niche or field-specific content. That being said, +regular schema. It is intended to enable gradual evolution of the schema, as +well as accommodating niche or field-specific content. That being said, reasonable limits should be adhered to. ## Containers @@ -88,7 +88,7 @@ guide. publicly available - `doi` (string): full DOI number, lower-case. Example: "10.1234/abcde.789". See the "External Identifiers" section of style guide. -- `isbn13` (string): external identifer for books. ISBN-9 and other formats +- `isbn13` (string): external identifier for books. ISBN-9 and other formats should be converted to canonical ISBN-13. See the "External Identifiers" section of style guide. - `core_id` (string): external identifier for the [CORE] open access @@ -144,7 +144,7 @@ guide. affiliation, "this is the corresponding author", etc. - `refs`: an array of references (aka, citations) to other releases. References can only be linked to a specific target release (not a work), though it may - be ambugious which release of a work is being referenced if the citation is + be ambiguous which release of a work is being referenced if the citation is not specific enough. Reference fields include: - `index` (integer, optional): reference lists and bibliographies almost always have an implicit order. Zero-indexed. Note that this is distinct @@ -154,10 +154,10 @@ guide. - `extra` (JSON, optional): additional citation format metadata can be stored here, particularly if the citation schema does not align. Common fields might be "volume", "authors", "issue", "publisher", "url", and - external identifers ("doi", "isbn13"). + external identifiers ("doi", "isbn13"). - `key` (string): works often reference works with a short slug or index number, which can be captured here. For example, "[BROWN2017]". Keys - generally supercede the `index` field, though both can/should be + generally supersede the `index` field, though both can/should be supplied. - `year` (integer): year of publication of the cited release. - `container_title` (string): if applicable, the name of the container of @@ -215,7 +215,7 @@ vocabulary (TODO: should it follow [CSL types](http://docs.citationstyles.org/en can be a `stub` release under the same work. `stub` releases shouldn't be considered full releases when counting or aggregating (though if technically difficult this may not always be implemented). Other things that can be - categorized as stubs (which seem to often end up miscategorized as full + categorized as stubs (which seem to often end up mis-categorized as full articles in bibliographic databases): - an abstract, which is only an abstract of a larger work - commercial advertisements @@ -223,7 +223,7 @@ vocabulary (TODO: should it follow [CSL types](http://docs.citationstyles.org/en detect re-publishing without attribution - "This page is intentionally blank" - "About the author", "About the editors", "About the cover" - - "Acknowledgements" + - "Acknowledgments" - "Notices" Other types from Crossref (such as `component`, `reference-entry`) are valid, diff --git a/guide/src/entity_types.md b/guide/src/entity_types.md index 1a74f79e..489a62e8 100644 --- a/guide/src/entity_types.md +++ b/guide/src/entity_types.md @@ -4,4 +4,4 @@ TODO: entity-type-specific scope and quality guidance ## Work/Release/File Distinctions -TODO: clarify distinctions and relationship between theese three entity types +TODO: clarify distinctions and relationship between these three entity types diff --git a/guide/src/goals.md b/guide/src/goals.md index 048d9cb1..e7ef1512 100644 --- a/guide/src/goals.md +++ b/guide/src/goals.md @@ -12,7 +12,7 @@ In the larger ecosystem, fatcat could also provide: - A work-level (as opposed to title-level) archival dashboard: what fraction of all published works are preserved in archives? [KBART](), [CLOCKSS](), - [Portico](), and other preservations don't provide granular metadata + [Portico](), and other preservation networks don't provide granular metadata - A collaborative, independent, non-commercial, fully-open, field-agnostic, "completeness"-oriented catalog of scholarly metadata - Unified (centralized) foundation for discovery and access across repositories diff --git a/guide/src/implementation.md b/guide/src/implementation.md index df8d66b9..66ae7f6b 100644 --- a/guide/src/implementation.md +++ b/guide/src/implementation.md @@ -11,7 +11,7 @@ operated by anybody. A separate web interface project talks to the API backend and can be developed more rapidly with less concern about data loss or corruption. -A cronjob will creae periodic database dumps, both in "full" form (all tables +A cronjob will create periodic database dumps, both in "full" form (all tables and all edit history, removing only authentication credentials) and "flattened" form (with only the most recent version of each entity). @@ -23,4 +23,4 @@ to a rigid third-party ontology or schema. Microservice daemons should be able to proxy between the primary API and standard protocols like ResourceSync and OAI-PMH, and third party bots could -ingest or synchronize the databse in those formats. +ingest or synchronize the database in those formats. diff --git a/guide/src/overview.md b/guide/src/overview.md index 58107429..68171905 100644 --- a/guide/src/overview.md +++ b/guide/src/overview.md @@ -6,7 +6,7 @@ This section gives an introduction to: - the goals of the project, and now it relates to the rest of the Open Access and archival ecosystem - how catalog data is represented as entities and revisions with full edit - history, and how entities are refered to and cross-referenced with + history, and how entities are referred to and cross-referenced with identifiers - how humans and bots propose changes to the catalog, and how these changes are reviewed diff --git a/guide/src/policies.md b/guide/src/policies.md index 18d84a36..03e5e526 100644 --- a/guide/src/policies.md +++ b/guide/src/policies.md @@ -42,14 +42,14 @@ history, including all of their contributions. ## Immutable History All editors agree to the licensing terms, and understand that their full public -history of contributions is made irrevokably public. Edits and contributions +history of contributions is made irrevocably public. Edits and contributions may be *reverted*, but the history (and content) of their edits are retained. Edit history is not removed from the corpus on the request of an editor or when an editor closes their account. In an emergency situation, such as non-bibliographic content getting encoded in the corpus by bypassing normal filters (eg, base64 encoding hate crime content -or exploitive photos, as has happened to some blockchain projects), the +or exploitative photos, as has happened to some blockchain projects), the ecosystem may decide to collectively, in a coordinated manner, expunge specific records from their history. @@ -73,8 +73,8 @@ servers hosting early deployments of fatcat are largely in a default configuration and have not been audited to ensure that these guidelines are being followed.* -It is a goal for fatcat to conduct as little surveillence of reader and editor -bahavior and activities as possible. In pratical terms, this means minimizing +It is a goal for fatcat to conduct as little surveillance of reader and editor +behavior and activities as possible. In practical terms, this means minimizing the overall amount of logging and collection of identifying information. This is in contrast to *submitted edit content*, which is captured, preserved, and republished as widely as possible. @@ -94,8 +94,8 @@ Exceptions will likely be made: Some uncertain areas of privacy include: -- should third-party authenticion identities be linked to editor ids? what - about the specific case of ORCiDs if used for login? +- should third-party authentication identities be linked to editor ids? what + about the specific case of ORCID if used for login? - what about discussion and comments on edits? should conversations be included in full history dumps? should editors be allowed to update or remove comments? diff --git a/guide/src/roadmap.md b/guide/src/roadmap.md index 1a2def31..745380f9 100644 --- a/guide/src/roadmap.md +++ b/guide/src/roadmap.md @@ -32,7 +32,7 @@ Longer term projects could include: - bi-directional synchronization with other user-editable catalogs, such as Wikidata - better representation of multi-file objects such as websites and datasets -- altenate/enhanced backend to store full edit history without overloading +- alternate/enhanced backend to store full edit history without overloading traditional relational database ## Known Issues diff --git a/guide/src/sources.md b/guide/src/sources.md index b8853d8a..5b3d9d3e 100644 --- a/guide/src/sources.md +++ b/guide/src/sources.md @@ -24,6 +24,6 @@ institution-specific catalogs. Progeny information (where the metadata comes from, or who "makes specific claims") is stored in edit metadata in the data model. Value-level attribution -cna be achived by looking at the full edit history for an entity as a series of +can be achieved by looking at the full edit history for an entity as a series of patches. diff --git a/guide/src/style_guide.md b/guide/src/style_guide.md index 35d13e97..7f819c8d 100644 --- a/guide/src/style_guide.md +++ b/guide/src/style_guide.md @@ -7,7 +7,7 @@ entity, or even a "native"/"international" representation as seems common in other bibliographic systems. This most notably applies to release titles, but also to container and publisher names, and likely other fields. -For now, editors must use their own judgement over whether to use the title of +For now, editors must use their own judgment over whether to use the title of the release listed in the work itself This is not to be confused with *translations* of entire works, which should be @@ -30,9 +30,9 @@ All DOIs stored in an entity column should be registered (aka, should be resolvable from `doi.org`). Invalid identifiers may be cleaned up or removed by bots. -DOIs should *always* be stored and transfered in lower-case form. Note that +DOIs should *always* be stored and transferred in lower-case form. Note that there are almost no other constraints on DOIs (and handles in general): they -may have muliple forward slashes, whitespace, of arbitrary length, etc. +may have multiple forward slashes, whitespace, of arbitrary length, etc. Crossref has a [number of examples]() of such "valid" but frustratingly formatted strings. @@ -60,28 +60,28 @@ background reading, see: Particular difficult issues in the context of a bibliographic database include the non-universal concept of "family" vs. "given" names and their relationship -to first and last names; the inclusion of honarary titles and other suffixes -and prefixes to a name; the distinction between "prefered", "legal", and +to first and last names; the inclusion of honorary titles and other suffixes +and prefixes to a name; the distinction between "preferred", "legal", and "bibliographic" names, or other situations where a person may not wish to be -known under the name they are commonly refered to under; language and character +known under the name they are commonly referred to under; language and character set issues; and pseudonyms, anonymous publications, and fake personas (perhaps representing a group, like Bourbaki). The general guidance for Fatcat is to: -- not be a "source of truth" for representing a persona or human being; ORCiD +- not be a "source of truth" for representing a persona or human being; ORCID and Wikidata are better suited to this task - represent author personas, not necessarily 1-to-1 with human beings - prioritize the concerns of a reader or researcher over that of the author - enable basic interoperability with external databases, file formats, schemas, - and style gudies + and style guides - when possible, respect the wishes of individuals The data model for the `creator` entity has three name fields: - `surname` and `given_name`: needed for "aligning" with external databases, and to export metadata to many standard formats -- `display_name`: the "prefered" representation for display of the entire name, +- `display_name`: the "preferred" representation for display of the entire name, in the context of international attribution of authorship of a written work Names to not necessarily need to expressed in a Latin character set, but also @@ -101,7 +101,7 @@ of a reasonable size for review and acceptance. For example, merging two `creators` and updating related `releases` could all go in a single editgroup. Large refactors, conversions, and imports, which may touch thousands of entities, should be grouped into reasonable size editgroups; extremely large -editgroups may cause technical issues, and make review unmanagable. 50 edits is +editgroups may cause technical issues, and make review unmanageable. 50 edits is a decent batch size, and 100 is a good upper limit (and may be enforced by the server). diff --git a/guide/src/workflow.md b/guide/src/workflow.md index fd53f6a9..996fb24c 100644 --- a/guide/src/workflow.md +++ b/guide/src/workflow.md @@ -3,7 +3,7 @@ ## Basic Editing Workflow and Bots Both human editors and bots should have edits go through the same API, with -humans using either the default web interface, integrations, or client +humans using either the default web interface, integration, or client software. The normal workflow is to create edits (or updates, merges, deletions) on @@ -22,7 +22,7 @@ push through edits more rapidly (eg, importing new works from a publisher API). Bots need to be tuned to have appropriate edit group sizes (eg, daily batches, instead of millions of works in a single edit) to make human QA review and -reverts managable. +reverts manageable. Data progeny and source references are captured in the edit metadata, instead of being encoded in the entity data model itself. In the case of importing @@ -33,5 +33,5 @@ Human editors can leave edit messages to clarify their sources. A [style guide](./style_guide.md) and discussion forum are intended to be be hosted as separate stand-alone services for editors to propose projects and debate process or scope changes. These services should have unified accounts -and logins (oauth?) for consistent account IDs across all services. +and logins (OAuth?) for consistent account IDs across all services. |