diff options
author | Bryan Newbold <bnewbold@archive.org> | 2022-09-14 12:00:59 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2022-09-14 12:01:01 -0700 |
commit | 2c19b7180e83c70411516c63b8dced5429b450f4 (patch) | |
tree | 8f7e4c9237f9bd9c84af9d226aa0a48eb49b7c3d /python_hadoop/backfill_hbase_from_cdx.py | |
parent | a283b054dc98620046dff28cbb16663564b8320b (diff) | |
download | sandcrawler-2c19b7180e83c70411516c63b8dced5429b450f4.tar.gz sandcrawler-2c19b7180e83c70411516c63b8dced5429b450f4.zip |
catch poppler 'ValueError' when parsing PDFs
Seeing a spike in bad PDFs in the past week or so, while processing old
failed ingests. Should really switch from poppler to muPDF.
Diffstat (limited to 'python_hadoop/backfill_hbase_from_cdx.py')
0 files changed, 0 insertions, 0 deletions