diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-10-21 12:20:52 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-10-21 12:20:54 -0700 |
commit | 200bf734bd459dd3c7a147b3dfe127dbf0ed7f70 (patch) | |
tree | 4f010e66a059271ac3b9c496d15a3bc90bd763c4 /python/sandcrawler/pdfextract.py | |
parent | 33249f2679851afb64142c428be45d16f35f5539 (diff) | |
download | sandcrawler-200bf734bd459dd3c7a147b3dfe127dbf0ed7f70.tar.gz sandcrawler-200bf734bd459dd3c7a147b3dfe127dbf0ed7f70.zip |
differential wayback-error from wayback-content-error
The motivation here is to distinguish errors due to current content in
wayback (eg, in WARCs) from operational errors (eg, wayback machine is
down, or network failures/disruption).
Diffstat (limited to 'python/sandcrawler/pdfextract.py')
-rw-r--r-- | python/sandcrawler/pdfextract.py | 1 |
1 files changed, 0 insertions, 1 deletions
diff --git a/python/sandcrawler/pdfextract.py b/python/sandcrawler/pdfextract.py index d8a90c1..70d2f93 100644 --- a/python/sandcrawler/pdfextract.py +++ b/python/sandcrawler/pdfextract.py @@ -11,7 +11,6 @@ from PIL import Image from .workers import SandcrawlerWorker, SandcrawlerFetchWorker from .misc import gen_file_metadata -from .ia import WaybackClient, WaybackError, PetaboxError # This is a hack to work around timeouts when processing certain PDFs with |