diff options
Diffstat (limited to 'proposals/2020_pdf_meta_thumbnails.md')
-rw-r--r-- | proposals/2020_pdf_meta_thumbnails.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/proposals/2020_pdf_meta_thumbnails.md b/proposals/2020_pdf_meta_thumbnails.md index 793d6b5..141ece8 100644 --- a/proposals/2020_pdf_meta_thumbnails.md +++ b/proposals/2020_pdf_meta_thumbnails.md @@ -1,5 +1,5 @@ -status: work-in-progress +status: deployed New PDF derivatives: thumbnails, metadata, raw text =================================================== @@ -133,7 +133,7 @@ Deployment will involve: Plan for processing/catchup is: - test with COVID-19 PDF corpus -- run extraction on all current fatcat files avaiable via IA +- run extraction on all current fatcat files available via IA - integrate with ingest pipeline for all new files - run a batch catchup job over all GROBID-parsed files with no pdf meta extracted, on basis of SQL table query |