aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-11-24 16:05:45 -0800
committerBryan Newbold <bnewbold@archive.org>2021-11-24 16:05:45 -0800
commit20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b (patch)
treef00a373d76b7cbed0fe6df695f8332dc3ac941d3
parentdfd13be5a7ac87b8b6c186986624f97da02b8923 (diff)
downloadsandcrawler-20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b.tar.gz
sandcrawler-20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b.zip
codespell typos in README and original RFC
-rw-r--r--README.md2
-rw-r--r--sandcrawler-rfc.md2
2 files changed, 2 insertions, 2 deletions
diff --git a/README.md b/README.md
index afe1ff6..a0eaa98 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ pipeline. This code is of mixed quality and is mostly experimental. The goal
for most of this is to submit metadata to [fatcat](https://fatcat.wiki), which
is the more stable, maintained, and public-facing service.
-Code in this repository is potentially public! Not intented to accept public
+Code in this repository is potentially public! Not intended to accept public
contributions for the most part. Much of this will not work outside the IA
cluster environment.
diff --git a/sandcrawler-rfc.md b/sandcrawler-rfc.md
index fea6a7c..ecf7ab8 100644
--- a/sandcrawler-rfc.md
+++ b/sandcrawler-rfc.md
@@ -73,7 +73,7 @@ process HTML and look for PDF outlinks, but wouldn't crawl recursively.
HBase is used for de-dupe, with records (pointers) stored in WARCs.
A second config would take seeds as entire journal websites, and would crawl
-continously.
+continuously.
Other components of the system "push" tasks to the crawlers by copying schedule
files into the crawl action directories.