aboutsummaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
Diffstat (limited to 'notes')
-rw-r--r--notes/UNSORTED.txt4
-rw-r--r--notes/bulk_edits/2019-10-08_file_cleanups.md2
-rw-r--r--notes/bulk_edits/2020-03-19_arxiv_pubmed.md2
-rw-r--r--notes/bulk_edits/2020-09-02_file_meta.md2
-rw-r--r--notes/bulk_edits/2020-12-23_dblp.md2
-rw-r--r--notes/bulk_edits/2020_datacite.md2
-rw-r--r--notes/cleanups/wayback_timestamps.md4
-rw-r--r--notes/data_model.md4
-rw-r--r--notes/performance/postgres_performance.txt2
9 files changed, 12 insertions, 12 deletions
diff --git a/notes/UNSORTED.txt b/notes/UNSORTED.txt
index 3960f5eb..850b54d0 100644
--- a/notes/UNSORTED.txt
+++ b/notes/UNSORTED.txt
@@ -3,7 +3,7 @@ Not allowed to PUT edits to the same entity in the same editgroup. If you want
to update an edit, need to delete the old one first.
The state depends only on the current entity state, not any redirect. This
-means that if the target of a redirect is delted, the redirecting entity is
+means that if the target of a redirect is deleted, the redirecting entity is
still "redirect", not "deleted".
Redirects-to-redirects are not allowed; this is enforced when the editgroup is
@@ -31,7 +31,7 @@ redirects after some delay period.
=> it would not be too hard to update get_release_files to check for such
redirects; could be handled by request flag?
-`prev_rev` is naively set to the most-recent previous state. If the curent
+`prev_rev` is naively set to the most-recent previous state. If the current
state was deleted or a redirect, it is set to null.
This parameter is not checked/enforced at edit accept time (but could be, and
diff --git a/notes/bulk_edits/2019-10-08_file_cleanups.md b/notes/bulk_edits/2019-10-08_file_cleanups.md
index b61b37f0..2eebb363 100644
--- a/notes/bulk_edits/2019-10-08_file_cleanups.md
+++ b/notes/bulk_edits/2019-10-08_file_cleanups.md
@@ -5,7 +5,7 @@ web.archive.org). These URLs were created accidentally during fatcat
boostrapping; there are about 300k such file enties to fix.
Will also update archive.org link reltype to 'archive' (instead of
-'repository'), which is the new prefered style.
+'repository'), which is the new preferred style.
Generated the set of files to update like:
diff --git a/notes/bulk_edits/2020-03-19_arxiv_pubmed.md b/notes/bulk_edits/2020-03-19_arxiv_pubmed.md
index b2fd29d5..56e88880 100644
--- a/notes/bulk_edits/2020-03-19_arxiv_pubmed.md
+++ b/notes/bulk_edits/2020-03-19_arxiv_pubmed.md
@@ -1,7 +1,7 @@
On 2020-03-20, automated daily harvesting and importing of arxiv and pubmed
metadata started. In the case of pubmed, updates are enabled, so that recently
-created DOI releases get updated with PMID and extra metdata.
+created DOI releases get updated with PMID and extra metadata.
We also want to do last backfills of metadata since the last import up through
the first day updated by the continuous harvester.
diff --git a/notes/bulk_edits/2020-09-02_file_meta.md b/notes/bulk_edits/2020-09-02_file_meta.md
index 35c4d87f..b0606f2d 100644
--- a/notes/bulk_edits/2020-09-02_file_meta.md
+++ b/notes/bulk_edits/2020-09-02_file_meta.md
@@ -25,7 +25,7 @@ Partial wayback URL timestamps, for cases where we have the full timestamped URL
https://qa.fatcat.wiki/file/k73il3k5hzemtnkqa5qyorg6ci
https://qa.fatcat.wiki/file/7hstlrabfjb6vgyph7ntqtpkne
-Live-web URLs identical except for http/https flip or other trival things (much less frequent case):
+Live-web URLs identical except for http/https flip or other trivial things (much less frequent case):
http://eo1.gsfc.nasa.gov/new/validationReport/Technology/JoeCD/asner_etal_PNAS_20041.pdf
https://eo1.gsfc.nasa.gov/new/validationReport/Technology/JoeCD/asner_etal_PNAS_20041.pdf
diff --git a/notes/bulk_edits/2020-12-23_dblp.md b/notes/bulk_edits/2020-12-23_dblp.md
index c3ad0587..a33411cb 100644
--- a/notes/bulk_edits/2020-12-23_dblp.md
+++ b/notes/bulk_edits/2020-12-23_dblp.md
@@ -52,4 +52,4 @@ Run import:
=> Counter({'total': 7953365, 'has-doi': 4277307, 'skip': 3097418, 'skip-key-type': 2640968, 'skip-update': 2480449, 'exists': 943800, 'update': 889700, 'insert': 338842, 'skip-arxiv-corr': 312872, 'exists-fuzzy': 203103, 'skip-dblp-container-missing': 143578, 'skip-arxiv': 53, 'skip-title': 1})
Starting database size (roughly): Size: 684.08G
-Ending databse size: Size: 690.22G
+Ending database size: Size: 690.22G
diff --git a/notes/bulk_edits/2020_datacite.md b/notes/bulk_edits/2020_datacite.md
index 005841ae..05d09517 100644
--- a/notes/bulk_edits/2020_datacite.md
+++ b/notes/bulk_edits/2020_datacite.md
@@ -54,7 +54,7 @@ Compare with `--lang-detect`:
user 3m5.620s
sys 0m13.344s
-Not noticable?
+Not noticeable?
Whole run:
diff --git a/notes/cleanups/wayback_timestamps.md b/notes/cleanups/wayback_timestamps.md
index e3ea942d..9db77058 100644
--- a/notes/cleanups/wayback_timestamps.md
+++ b/notes/cleanups/wayback_timestamps.md
@@ -1,6 +1,6 @@
-At some point, using the arabesque importer (from targetted crawling), we
-accidentially imported a bunch of files with wayback URLs that have 12-digit
+At some point, using the arabesque importer (from targeted crawling), we
+accidentally imported a bunch of files with wayback URLs that have 12-digit
timestamps, instead of the full canonical 14-digit timestamps.
diff --git a/notes/data_model.md b/notes/data_model.md
index 2d2825ae..f13e33cc 100644
--- a/notes/data_model.md
+++ b/notes/data_model.md
@@ -87,12 +87,12 @@ Each entity type has tables:
core representation of a version of the entity
_ident
- persistant, external identifier
+ persistent, external identifier
allows merging, unmerging, stable cross-entity references
_edit
represents change metadata for a single change to one ident
- needed because an edit alwasy changes ident, but might not change rev
+ needed because an edit always changes ident, but might not change rev
Could someday also have:
diff --git a/notes/performance/postgres_performance.txt b/notes/performance/postgres_performance.txt
index cd2a5162..ff8fcb3b 100644
--- a/notes/performance/postgres_performance.txt
+++ b/notes/performance/postgres_performance.txt
@@ -189,7 +189,7 @@ max_wal_size wasn't getting set correctly.
The statements taking the most time are the complex inserts (multi-table
inserts); they take a fraction of a second though (mean less than a
-milisecond).
+millisecond).
Manifest import runs really slow if release import is concurrent; much faster
to wait until release import is done first (like a factor of 10x or more).