aboutsummaryrefslogtreecommitdiffstats
path: root/pig/filter-cdx-paper-pdfs.pig
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2022-09-14 12:00:59 -0700
committerBryan Newbold <bnewbold@archive.org>2022-09-14 12:01:01 -0700
commit2c19b7180e83c70411516c63b8dced5429b450f4 (patch)
tree8f7e4c9237f9bd9c84af9d226aa0a48eb49b7c3d /pig/filter-cdx-paper-pdfs.pig
parenta283b054dc98620046dff28cbb16663564b8320b (diff)
downloadsandcrawler-2c19b7180e83c70411516c63b8dced5429b450f4.tar.gz
sandcrawler-2c19b7180e83c70411516c63b8dced5429b450f4.zip
catch poppler 'ValueError' when parsing PDFs
Seeing a spike in bad PDFs in the past week or so, while processing old failed ingests. Should really switch from poppler to muPDF.
Diffstat (limited to 'pig/filter-cdx-paper-pdfs.pig')
0 files changed, 0 insertions, 0 deletions