summaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2021-11-24 15:48:01 -0800
committerBryan Newbold <bnewbold@robocracy.org>2021-11-24 15:48:01 -0800
commitd6b1d3de6224b590a82b175f78b761df1a6df4a2 (patch)
treecdc5904d0136432fbfa0500fe136897eea650c34 /proposals
parentcc0393de91301a29bd469e38519125a530b4472d (diff)
downloadfatcat-d6b1d3de6224b590a82b175f78b761df1a6df4a2.tar.gz
fatcat-d6b1d3de6224b590a82b175f78b761df1a6df4a2.zip
codespell fixes to proposals
Diffstat (limited to 'proposals')
-rw-r--r--proposals/20190510_release_ext_ids.md2
-rw-r--r--proposals/202008_bulk_citation_graph.md2
-rw-r--r--proposals/2020_client_cli.md4
-rw-r--r--proposals/2020_fuzzy_matching.md6
-rw-r--r--proposals/2020_metadata_cleanups.md2
-rw-r--r--proposals/2021-01-29_citation_api.md2
-rw-r--r--proposals/README.md2
7 files changed, 10 insertions, 10 deletions
diff --git a/proposals/20190510_release_ext_ids.md b/proposals/20190510_release_ext_ids.md
index 8953448c..b0a484ad 100644
--- a/proposals/20190510_release_ext_ids.md
+++ b/proposals/20190510_release_ext_ids.md
@@ -23,7 +23,7 @@ sure this is worth it though.
## New API
-All identifers as text
+All identifiers as text
release_entity
ext_ids (required)
diff --git a/proposals/202008_bulk_citation_graph.md b/proposals/202008_bulk_citation_graph.md
index f8868e45..65db0d94 100644
--- a/proposals/202008_bulk_citation_graph.md
+++ b/proposals/202008_bulk_citation_graph.md
@@ -43,7 +43,7 @@ The high-level prosposal is:
types
- sort the "source" references into an index and run a merge-sort on bucket
keys against the "target" index to generate candidate match buckets
-- run python fuzzy match code against the candidate buckets, outputing a status
+- run python fuzzy match code against the candidate buckets, outputting a status
for each reference input and a list of all strong matches
- resort successful matches and index by both source and target identifiers as
output citation graph
diff --git a/proposals/2020_client_cli.md b/proposals/2020_client_cli.md
index 2a0c8fa1..01d190a8 100644
--- a/proposals/2020_client_cli.md
+++ b/proposals/2020_client_cli.md
@@ -69,7 +69,7 @@ Argument conventions:
':' Lookup specifier for entity (eg, external identifier like `doi:10.123/abc`)
'=' Assign field to value in create or update contexts. Non-string
- values often can be infered by field type
+ values often can be inferred by field type
':=' Assign field to non-string value in create or update contexts
@@ -92,7 +92,7 @@ Small details (mostly TODO):
'@' Form field
Output goes to stdout (pretty-printed), unless specified to `--download / -d`),
-in which case output file is infered, or `--output` sets it explicitly.
+in which case output file is inferred, or `--output` sets it explicitly.
### Internet Archive `ia` Tool
diff --git a/proposals/2020_fuzzy_matching.md b/proposals/2020_fuzzy_matching.md
index 30c321e3..e84c2bd2 100644
--- a/proposals/2020_fuzzy_matching.md
+++ b/proposals/2020_fuzzy_matching.md
@@ -244,7 +244,7 @@ use-cases:
Optionally, we could also architect/design this tool to replace biblio-glutton
for ingest-time "reference consolidation", by exposing a biblio-glutton
compatible API. If this isn't possible or hard it could become a later tool
-instead. Eg, shouldn't sacrafice batch performance for this. In particular, for
+instead. Eg, shouldn't sacrifice batch performance for this. In particular, for
ingest-time reference matching we'd want the backing corpus to be updated
continuously, which might be tricky or in conflict with batch-mode design.
@@ -289,7 +289,7 @@ reading the Scala and Python source
## Longtail OA Import Filtering
-Not direcly related to matching, but filtering mixed-quality metadata.
+Not directly related to matching, but filtering mixed-quality metadata.
As part of Longtail OA preservation work, we ran a crawl of small OA journal
websites, and then ran GROBID over the resulting PDFs to extract metadata. We
@@ -383,7 +383,7 @@ indices. It is also possible to iterate over both indices by bucket and doing
further processing between all the papers, then combined the matches/groups
from both iterations. The reason for using two indices is to be robust against
mangled metadata where there is added junk or missing words at either the
-begining or end of the title.
+beginning or end of the title.
To verify candidate pairs, the Jaccard similarity is calculated between the
full original title strings. This flexibly allows for character typos (human or
diff --git a/proposals/2020_metadata_cleanups.md b/proposals/2020_metadata_cleanups.md
index cf6b08e5..b95f6579 100644
--- a/proposals/2020_metadata_cleanups.md
+++ b/proposals/2020_metadata_cleanups.md
@@ -88,7 +88,7 @@ At some point, had many "NULL" publishers.
"Type" coverage should be improved.
-"Publisher type" (infered in various ways in chocula tool) could be included in
+"Publisher type" (inferred in various ways in chocula tool) could be included in
`extra` and end up in search faceting.
Overall OA status should probably be more sophisticated: gold, green, etc.
diff --git a/proposals/2021-01-29_citation_api.md b/proposals/2021-01-29_citation_api.md
index 3805dcac..6379da09 100644
--- a/proposals/2021-01-29_citation_api.md
+++ b/proposals/2021-01-29_citation_api.md
@@ -212,7 +212,7 @@ would make "outbound" queries a trivial key lookup, instead of a query by
rows would be returned, with unwanted metadata.
Another alternative design would be storing more metadata about source and
-target in each row. This would remove the ned to do separate
+target in each row. This would remove the need to do separate
"hydration"/"enrich" fetches. This would probably blow up in the index size
though, and would require more aggressive re-indexing (in a live-updated
scenario). Eg, when a new fulltext file is updated (access option), would need
diff --git a/proposals/README.md b/proposals/README.md
index 5e6747b1..31184fe3 100644
--- a/proposals/README.md
+++ b/proposals/README.md
@@ -6,6 +6,6 @@ is large enough to require planning and documentation.
Each should be tagged with a date first drafted, and labeled with a status:
- brainstorm: just putting ideas down; might not even happen
-- planned: commited to happening, but not started yet
+- planned: committed to happening, but not started yet
- work-in-progress: currently being worked on
- implemented: completed, merged to master/production/live