summaryrefslogtreecommitdiffstats
path: root/python
diff options
context:
space:
mode:
Diffstat (limited to 'python')
-rw-r--r--python/README_import.md2
-rw-r--r--python/fatcat_tools/harvest/pubmed.py2
-rw-r--r--python/fatcat_tools/importers/common.py4
-rw-r--r--python/fatcat_web/entity_helpers.py2
-rw-r--r--python/fatcat_web/search.py4
-rw-r--r--python/fatcat_web/templates/container_create.html2
-rw-r--r--python/fatcat_web/templates/container_edit.html2
-rw-r--r--python/fatcat_web/templates/entity_create_toml.html2
-rw-r--r--python/fatcat_web/templates/entity_delete.html2
-rw-r--r--python/fatcat_web/templates/entity_edit_toml.html2
-rw-r--r--python/fatcat_web/templates/file_create.html2
-rw-r--r--python/fatcat_web/templates/file_edit.html2
-rw-r--r--python/fatcat_web/templates/home.html2
-rw-r--r--python/fatcat_web/templates/openlibrary_view_fuzzy_refs.html2
-rw-r--r--python/fatcat_web/templates/release_create.html2
-rw-r--r--python/fatcat_web/templates/release_edit.html4
-rw-r--r--python/fatcat_web/templates/release_lookup.html2
-rw-r--r--python/fatcat_web/templates/rfc.html10
-rw-r--r--python/fatcat_web/templates/wikipedia_view_fuzzy_refs.html2
19 files changed, 26 insertions, 26 deletions
diff --git a/python/README_import.md b/python/README_import.md
index 74e75e14..1d54f9d7 100644
--- a/python/README_import.md
+++ b/python/README_import.md
@@ -140,7 +140,7 @@ Takes a few hours.
## dblp
See `extra/dblp/README.md` for notes about first importing container metadata
-and getting a TSV mapping flie to help with import. This is needed because
+and getting a TSV mapping file to help with import. This is needed because
there is not (yet) a lookup mechanism for `dblp_prefix` as an identifier of
container entities.
diff --git a/python/fatcat_tools/harvest/pubmed.py b/python/fatcat_tools/harvest/pubmed.py
index 560427fb..78b1755b 100644
--- a/python/fatcat_tools/harvest/pubmed.py
+++ b/python/fatcat_tools/harvest/pubmed.py
@@ -279,7 +279,7 @@ def ftpretr(
"ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/pubmed20n1016.xml.gz") to a
local temporary file. Returns the name of the local, closed temporary file.
- It is the reponsibility of the caller to cleanup the temporary file.
+ It is the responsibility of the caller to cleanup the temporary file.
Implements a basic retry mechanism, e.g. that became an issue in 08/2021,
when we encountered EOFError while talking to the FTP server. Retry delay in seconds.
diff --git a/python/fatcat_tools/importers/common.py b/python/fatcat_tools/importers/common.py
index e2157ee5..cd51a24c 100644
--- a/python/fatcat_tools/importers/common.py
+++ b/python/fatcat_tools/importers/common.py
@@ -432,7 +432,7 @@ class EntityImporter:
- WEAK
- AMBIGUOUS
- Eg, if there is any EXACT match that is always returned; an AMBIGIOUS
+ Eg, if there is any EXACT match that is always returned; an AMBIGUOUS
result is only returned if all the candidate matches were ambiguous.
"""
@@ -725,7 +725,7 @@ class KafkaBs4XmlPusher(RecordPusher):
while True:
# Note: this is batch-oriented, because underlying importer is
# often batch-oriented, but this doesn't confirm that entire batch
- # has been pushed to fatcat before commiting offset. Eg, consider
+ # has been pushed to fatcat before committing offset. Eg, consider
# case where there there is one update and thousands of creates;
# update would be lingering in importer, and if importer crashed
# never created.
diff --git a/python/fatcat_web/entity_helpers.py b/python/fatcat_web/entity_helpers.py
index 2e3b83c5..285513a8 100644
--- a/python/fatcat_web/entity_helpers.py
+++ b/python/fatcat_web/entity_helpers.py
@@ -92,7 +92,7 @@ def enrich_release_entity(entity: ReleaseEntity) -> ReleaseEntity:
# November 1.
if ref.extra and ref.extra.get("unstructured"):
ref.extra["unstructured"] = strip_extlink_xml(ref.extra["unstructured"])
- # for backwards compatability, copy extra['subtitle'] to subtitle
+ # for backwards compatibility, copy extra['subtitle'] to subtitle
if not entity.subtitle and entity.extra and entity.extra.get("subtitle"):
if isinstance(entity.extra["subtitle"], str):
entity.subtitle = entity.extra["subtitle"]
diff --git a/python/fatcat_web/search.py b/python/fatcat_web/search.py
index fdfc4d80..b9994f28 100644
--- a/python/fatcat_web/search.py
+++ b/python/fatcat_web/search.py
@@ -161,8 +161,8 @@ def agg_to_dict(agg: Any) -> Dict[str, Any]:
"""
Takes a simple term aggregation result (with buckets) and returns a simple
dict with keys as terms and counts as values. Includes an extra value
- '_other', and by convention aggregations should be writen to have "missing"
- vaules as '_unknown'.
+ '_other', and by convention aggregations should be written to have "missing"
+ values as '_unknown'.
"""
result = dict()
for bucket in agg.buckets:
diff --git a/python/fatcat_web/templates/container_create.html b/python/fatcat_web/templates/container_create.html
index be8c5671..2a705ffd 100644
--- a/python/fatcat_web/templates/container_create.html
+++ b/python/fatcat_web/templates/container_create.html
@@ -18,7 +18,7 @@ book series, or a blog. Not all publications are in a container.
<input class="ui primary submit button" type="submit" value="Create Container!">
<p>
<i>New container entity will be part of the current editgroup, which needs to be
- submited and approved before the entity will formally be included in the
+ submitted and approved before the entity will formally be included in the
catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/container_edit.html b/python/fatcat_web/templates/container_edit.html
index 1885197c..1c6f32e4 100644
--- a/python/fatcat_web/templates/container_edit.html
+++ b/python/fatcat_web/templates/container_edit.html
@@ -70,7 +70,7 @@
<br><br>
<input class="ui primary submit button" type="submit" value="Update Container!">
<p>
- <i>Edit will be part of the current editgroup, which needs to be submited and
+ <i>Edit will be part of the current editgroup, which needs to be submitted and
approved before the change is included in the catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/entity_create_toml.html b/python/fatcat_web/templates/entity_create_toml.html
index ec5bc4a2..2fd9a2bb 100644
--- a/python/fatcat_web/templates/entity_create_toml.html
+++ b/python/fatcat_web/templates/entity_create_toml.html
@@ -12,7 +12,7 @@
<input class="ui primary submit button" type="submit" value="Create {{ entity_type }}!">
<p>
<i>New {{ entity_type }} entity will be part of the current editgroup, which
- needs to be submited and approved before the entity will formally be included
+ needs to be submitted and approved before the entity will formally be included
in the catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/entity_delete.html b/python/fatcat_web/templates/entity_delete.html
index 85742bb3..98b6b8e6 100644
--- a/python/fatcat_web/templates/entity_delete.html
+++ b/python/fatcat_web/templates/entity_delete.html
@@ -31,7 +31,7 @@
<br><br>
<input class="ui primary submit button" type="submit" value="Update Release!">
<p>
- <i>Deletion will be part of the current editgroup, which needs to be submited and
+ <i>Deletion will be part of the current editgroup, which needs to be submitted and
approved before the change is included in the catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/entity_edit_toml.html b/python/fatcat_web/templates/entity_edit_toml.html
index b0252c82..6e99c402 100644
--- a/python/fatcat_web/templates/entity_edit_toml.html
+++ b/python/fatcat_web/templates/entity_edit_toml.html
@@ -33,7 +33,7 @@
<br><br>
<input class="ui primary submit button" type="submit" value="Update Release!">
<p>
- <i>Edit will be part of the current editgroup, which needs to be submited and
+ <i>Edit will be part of the current editgroup, which needs to be submitted and
approved before the change is included in the catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/file_create.html b/python/fatcat_web/templates/file_create.html
index affcfb6e..29612d0e 100644
--- a/python/fatcat_web/templates/file_create.html
+++ b/python/fatcat_web/templates/file_create.html
@@ -14,7 +14,7 @@
<input class="ui primary submit button" type="submit" value="Create File!">
<p>
<i>New file entity will be part of the current editgroup, which needs to be
- submited and approved before the entity will formally be included in the
+ submitted and approved before the entity will formally be included in the
catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/file_edit.html b/python/fatcat_web/templates/file_edit.html
index de16e59e..eeb25a9d 100644
--- a/python/fatcat_web/templates/file_edit.html
+++ b/python/fatcat_web/templates/file_edit.html
@@ -100,7 +100,7 @@
<br><br>
<input class="ui primary submit button" type="submit" value="Update File!">
<p>
- <i>Edit will be part of the current editgroup, which needs to be submited and
+ <i>Edit will be part of the current editgroup, which needs to be submitted and
approved before the change is included in the catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/home.html b/python/fatcat_web/templates/home.html
index acb943d9..5c8c33ba 100644
--- a/python/fatcat_web/templates/home.html
+++ b/python/fatcat_web/templates/home.html
@@ -240,7 +240,7 @@
<br><a href="/file/lookup">Other Hashes</a>
</form>
<tr><td><b>File Set</b>
- <br>datasets, suplementary materials
+ <br>datasets, supplementary materials
<td><a href="/fileset/create">Create</a>
{% if config.FATCAT_DOMAIN == 'fatcat.wiki' %}
<td><a href="/fileset/ho376wmdanckpp66iwfs7g22ne">Dataset</a>
diff --git a/python/fatcat_web/templates/openlibrary_view_fuzzy_refs.html b/python/fatcat_web/templates/openlibrary_view_fuzzy_refs.html
index 21bf76f2..e9444b75 100644
--- a/python/fatcat_web/templates/openlibrary_view_fuzzy_refs.html
+++ b/python/fatcat_web/templates/openlibrary_view_fuzzy_refs.html
@@ -16,7 +16,7 @@
<p>This page lists references to this book from other works (eg, journal articles).
{% elif direction == "out" %}
<h3>References</h3>
- <i>Refernces from this book to other entities.</i>
+ <i>References from this book to other entities.</i>
{% endif %}
{{ refs_macros.refs_table(hits, direction) }}
diff --git a/python/fatcat_web/templates/release_create.html b/python/fatcat_web/templates/release_create.html
index 4f5dabd7..09191111 100644
--- a/python/fatcat_web/templates/release_create.html
+++ b/python/fatcat_web/templates/release_create.html
@@ -14,7 +14,7 @@
<input class="ui primary submit button" type="submit" value="Create Release!">
<p>
<i>New release entity will be part of the current editgroup, which needs to be
- submited and approved before the entity will formally be included in the
+ submitted and approved before the entity will formally be included in the
catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/release_edit.html b/python/fatcat_web/templates/release_edit.html
index 0ac94be9..3f5c10f6 100644
--- a/python/fatcat_web/templates/release_edit.html
+++ b/python/fatcat_web/templates/release_edit.html
@@ -105,7 +105,7 @@
<br>
<br>
- <h3 class="ui dividing header">Identifers</h3>
+ <h3 class="ui dividing header">Identifiers</h3>
<br>
{{ edit_macros.form_field_inline(form.doi) }}
{{ edit_macros.form_field_inline(form.wikidata_qid) }}
@@ -148,7 +148,7 @@
<br><br>
<input class="ui primary submit button" type="submit" value="Update Release!">
<p>
- <i>Edit will be part of the current editgroup, which needs to be submited and
+ <i>Edit will be part of the current editgroup, which needs to be submitted and
approved before the change is included in the catalog.</i>
</form>
</div>
diff --git a/python/fatcat_web/templates/release_lookup.html b/python/fatcat_web/templates/release_lookup.html
index a0ef3bb3..20821a10 100644
--- a/python/fatcat_web/templates/release_lookup.html
+++ b/python/fatcat_web/templates/release_lookup.html
@@ -49,7 +49,7 @@ you don't know the version, you can append "v1" to get the first version.
<h2>DOI</h2>
<p><a href="https://en.wikipedia.org/wiki/Digital_object_identifier">
-Digital object identifer</a>: "it's not an identifier for a digital object,
+Digital object identifier</a>: "it's not an identifier for a digital object,
it's a digital identifier for an object". Except they are pretty much all
digital objects. Fatcat doesn't include all DOIs (eg, for granular components
or TV shows), but it should for all complete research publications. DOIs are
diff --git a/python/fatcat_web/templates/rfc.html b/python/fatcat_web/templates/rfc.html
index c7e7149f..fba6eff3 100644
--- a/python/fatcat_web/templates/rfc.html
+++ b/python/fatcat_web/templates/rfc.html
@@ -25,7 +25,7 @@
<p>As little &quot;application logic&quot; as possible should be embedded in this back-end; as much as possible would be pushed to bots which could be authored and operated by anybody. A separate web interface project talks to the API backend and can be developed more rapidly with less concern about data loss or corruption.</p>
<p>A cronjob will creae periodic database dumps, both in &quot;full&quot; form (all tables and all edit history, removing only authentication credentials) and &quot;flattened&quot; form (with only the most recent version of each entity).</p>
<p>A goal is to be linked-data/RDF/JSON-LD/semantic-web &quot;compatible&quot;, but not necessarily &quot;first&quot;. It should be possible to export the database in a relatively clean RDF form, and to fetch data in a variety of formats, but internally fatcat will not be backed by a triple-store, and will not be bound to a rigid third-party ontology or schema.</p>
-<p>Microservice daemons should be able to proxy between the primary API and standard protocols like ResourceSync and OAI-PMH, and third party bots could ingest or synchronize the databse in those formats.</p>
+<p>Microservice daemons should be able to proxy between the primary API and standard protocols like ResourceSync and OAI-PMH, and third party bots could ingest or synchronize the database in those formats.</p>
<h2 id="licensing">Licensing</h2>
<p>The core fatcat database should only contain verifiable factual statements (which isn't to say that all statements are &quot;true&quot;), not creative or derived content.</p>
<p>The goal is to have a very permissively licensed database: CC-0 (no rights reserved) if possible. Under US law, it should be possible to scrape and pull in factual data from other corpuses without adopting their licenses. The goal here isn't to avoid attribution (provenance information will be included, and a large sources and acknowledgments statement should be maintained and shipped with bulk exports), but trying to manage the intersection of all upstream source licenses seems untenable, and creates burdens for downstream users and developers.</p>
@@ -33,7 +33,7 @@
<h2 id="basic-editing-workflow-and-bots">Basic Editing Workflow and Bots</h2>
<p>Both human editors and bots should have edits go through the same API, with humans using either the default web interface, integrations, or client software.</p>
<p>The normal workflow is to create edits (or updates, merges, deletions) on individual entities. Individual changes are bundled into an &quot;edit group&quot; of related edits (eg, correcting authorship info for multiple works related to a single author). When ready, the editor would &quot;submit&quot; the edit group for review. During the review period, human editors vote and bots can perform automated checks. During this period the editor can make tweaks if necessary. After some fixed time period (72 hours?) with no changes and no blocking issues, the edit group would be auto-accepted if no merge conflicts have be created by other edits to the same entities. This process balances editing labor (reviews are easy, but optional) against quality (cool-down period makes it easier to detect and prevent spam or out-of-control bots). More sophisticated roles and permissions could allow some certain humans and bots to push through edits more rapidly (eg, importing new works from a publisher API).</p>
-<p>Bots need to be tuned to have appropriate edit group sizes (eg, daily batches, instead of millions of works in a single edit) to make human QA review and reverts managable.</p>
+<p>Bots need to be tuned to have appropriate edit group sizes (eg, daily batches, instead of millions of works in a single edit) to make human QA review and reverts manageable.</p>
<p>Data provenance and source references are captured in the edit metadata, instead of being encoded in the entity data model itself. In the case of importing external databases, the expectation is that special-purpose bot accounts are be used, and tag timestamps and external identifiers in the edit metadata. Human editors would leave edit messages to clarify their sources.</p>
<p>A style guide (wiki) and discussion forum would be hosted as separate stand-alone services for editors to propose projects and debate process or scope changes. These services should have unified accounts and logins (oauth?) to have consistent account IDs across all mediums.</p>
<h2 id="global-edit-changelog">Global Edit Changelog</h2>
@@ -47,13 +47,13 @@ https://fatcat.wiki/work/rzga5b9cd7efgh04iljk8f3jvz</code></pre>
<p>In comparison, 96-bit identifiers would have 20 characters and look like:</p>
<pre><code>work_rzga5b9cd7efgh04iljk
https://fatcat.wiki/work/rzga5b9cd7efgh04iljk</code></pre>
-<p>A 64-bit namespace would probably be large enought, and would work with database Integer columns:</p>
+<p>A 64-bit namespace would probably be large enough, and would work with database Integer columns:</p>
<pre><code>work_rzga5b9cd7efg
https://fatcat.wiki/work/rzga5b9cd7efg</code></pre>
<p>The idea would be to only have fatcat identifiers be used to interlink between databases, <em>not</em> to supplant DOIs, ISBNs, handle, ARKs, and other &quot;registered&quot; persistent identifiers.</p>
<h2 id="entities-and-internal-schema">Entities and Internal Schema</h2>
<p>Internally, identifiers would be lightweight pointers to &quot;revisions&quot; of an entity. Revisions are stored in their complete form, not as a patch or difference; if comparing to distributed version control systems, this is the git model, not the mercurial model.</p>
-<p>The entity revisions are immutable once accepted; the editting process involves the creation of new entity revisions and, if the edit is approved, pointing the identifier to the new revision. Entities cross-reference between themselves by <em>identifier</em> not <em>revision number</em>. Identifier pointers also support (versioned) deletion and redirects (for merging entities).</p>
+<p>The entity revisions are immutable once accepted; the editing process involves the creation of new entity revisions and, if the edit is approved, pointing the identifier to the new revision. Entities cross-reference between themselves by <em>identifier</em> not <em>revision number</em>. Identifier pointers also support (versioned) deletion and redirects (for merging entities).</p>
<p>Edit objects represent a change to a single entity; edits get batched together into edit groups (like &quot;commits&quot; and &quot;pull requests&quot; in git parlance).</p>
<p>SQL tables would probably look something like the (but specific to each entity type, with tables like <code>work_revision</code> not <code>entity_revision</code>):</p>
<pre><code>entity_ident
@@ -158,7 +158,7 @@ container (aka &quot;venue&quot;, &quot;serial&quot;, &quot;title&quot;)
<h2 id="controlled-vocabularies">Controlled Vocabularies</h2>
<p>Some special namespace tables and enums would probably be helpful; these could live in the database (not requiring a database migration to update), but should have more controlled editing workflow... perhaps versioned in the codebase:</p>
<ul>
-<li>identifier namespaces (DOI, ISBN, ISSN, ORCID, etc; but not the identifers themselves)</li>
+<li>identifier namespaces (DOI, ISBN, ISSN, ORCID, etc; but not the identifiers themselves)</li>
<li>subject categorization</li>
<li>license and open access status</li>
<li>work &quot;types&quot; (article vs. book chapter vs. proceeding, etc)</li>
diff --git a/python/fatcat_web/templates/wikipedia_view_fuzzy_refs.html b/python/fatcat_web/templates/wikipedia_view_fuzzy_refs.html
index 3e1453c1..2d2627b1 100644
--- a/python/fatcat_web/templates/wikipedia_view_fuzzy_refs.html
+++ b/python/fatcat_web/templates/wikipedia_view_fuzzy_refs.html
@@ -14,7 +14,7 @@
<p>This page lists references to a wikipedia article, from other works (eg, journal articles).
{% elif direction == "out" %}
<h3>References</h3>
- <i>Refernces from wikipedia article to other entities.</i>
+ <i>References from wikipedia article to other entities.</i>
{% endif %}
{{ refs_macros.refs_table(hits, direction) }}