From 20cec591d641cf5c6bea7ec7dbf734bc4d8efc1b Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 24 Nov 2021 16:05:45 -0800 Subject: codespell typos in README and original RFC --- README.md | 2 +- sandcrawler-rfc.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index afe1ff6..a0eaa98 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ pipeline. This code is of mixed quality and is mostly experimental. The goal for most of this is to submit metadata to [fatcat](https://fatcat.wiki), which is the more stable, maintained, and public-facing service. -Code in this repository is potentially public! Not intented to accept public +Code in this repository is potentially public! Not intended to accept public contributions for the most part. Much of this will not work outside the IA cluster environment. diff --git a/sandcrawler-rfc.md b/sandcrawler-rfc.md index fea6a7c..ecf7ab8 100644 --- a/sandcrawler-rfc.md +++ b/sandcrawler-rfc.md @@ -73,7 +73,7 @@ process HTML and look for PDF outlinks, but wouldn't crawl recursively. HBase is used for de-dupe, with records (pointers) stored in WARCs. A second config would take seeds as entire journal websites, and would crawl -continously. +continuously. Other components of the system "push" tasks to the crawlers by copying schedule files into the crawl action directories. -- cgit v1.2.3