diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-11-24 16:05:45 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-11-24 16:05:45 -0800 |
commit | 20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b (patch) | |
tree | f00a373d76b7cbed0fe6df695f8332dc3ac941d3 /sandcrawler-rfc.md | |
parent | dfd13be5a7ac87b8b6c186986624f97da02b8923 (diff) | |
download | sandcrawler-20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b.tar.gz sandcrawler-20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b.zip |
codespell typos in README and original RFC
Diffstat (limited to 'sandcrawler-rfc.md')
-rw-r--r-- | sandcrawler-rfc.md | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/sandcrawler-rfc.md b/sandcrawler-rfc.md index fea6a7c..ecf7ab8 100644 --- a/sandcrawler-rfc.md +++ b/sandcrawler-rfc.md @@ -73,7 +73,7 @@ process HTML and look for PDF outlinks, but wouldn't crawl recursively. HBase is used for de-dupe, with records (pointers) stored in WARCs. A second config would take seeds as entire journal websites, and would crawl -continously. +continuously. Other components of the system "push" tasks to the crawlers by copying schedule files into the crawl action directories. |