diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-11-24 16:01:47 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-11-24 16:01:51 -0800 |
commit | d93d542adf9d26633b0f3cfa361277ca677c46f3 (patch) | |
tree | c133d3030746afe25300a2e12a7645407a89b623 /proposals/2020_pdf_meta_thumbnails.md | |
parent | b4ca684c83d77a9fc6e7844ea8c45dfcb72aacb4 (diff) | |
download | sandcrawler-d93d542adf9d26633b0f3cfa361277ca677c46f3.tar.gz sandcrawler-d93d542adf9d26633b0f3cfa361277ca677c46f3.zip |
codespell fixes in proposals
Diffstat (limited to 'proposals/2020_pdf_meta_thumbnails.md')
-rw-r--r-- | proposals/2020_pdf_meta_thumbnails.md | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/proposals/2020_pdf_meta_thumbnails.md b/proposals/2020_pdf_meta_thumbnails.md index 793d6b5..f231a7f 100644 --- a/proposals/2020_pdf_meta_thumbnails.md +++ b/proposals/2020_pdf_meta_thumbnails.md @@ -133,7 +133,7 @@ Deployment will involve: Plan for processing/catchup is: - test with COVID-19 PDF corpus -- run extraction on all current fatcat files avaiable via IA +- run extraction on all current fatcat files available via IA - integrate with ingest pipeline for all new files - run a batch catchup job over all GROBID-parsed files with no pdf meta extracted, on basis of SQL table query |