aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/pdfextract.py
Commit message (Expand)AuthorAgeFilesLines
* bad pdf hashBryan Newbold2022-12-161-0/+1
* catch poppler 'ValueError' when parsing PDFsBryan Newbold2022-09-141-1/+2
* bad PDF sha1Bryan Newbold2022-09-121-0/+4
* bad PDF sha1Bryan Newbold2022-09-111-0/+2
* another bad PDF sha1Bryan Newbold2022-09-091-0/+1
* yet more bad PDF hashesBryan Newbold2022-09-081-0/+4
* yet another bad PDF sha1Bryan Newbold2022-07-271-0/+1
* yet another bad SHA1 PDF hashBryan Newbold2022-07-241-0/+1
* yet another bad PDFBryan Newbold2022-07-131-0/+1
* another bad PDF sha1Bryan Newbold2022-02-231-0/+1
* yet another bad PDF sha1Bryan Newbold2022-02-081-0/+1
* make fmt (black 21.9b0)Bryan Newbold2021-10-271-69/+78
* fix type annotations for petabox body fetch helperBryan Newbold2021-10-261-1/+2
* more progress on type annotationsBryan Newbold2021-10-261-12/+18
* flake8 clean (with current settings)Bryan Newbold2021-10-261-2/+0
* start handling trivial lint cleanups: unused imports, 'is None', etcBryan Newbold2021-10-261-1/+1
* make fmtBryan Newbold2021-10-261-14/+15
* python: isort all importsBryan Newbold2021-10-261-6/+5
* yet another bad PDF sha1Bryan Newbold2021-09-301-0/+1
* yet more PDF sha1 to skipBryan Newbold2021-09-031-0/+5
* more bad PDF hashesBryan Newbold2021-07-261-0/+2
* another bad PDF sha1Bryan Newbold2021-07-131-0/+1
* pdf: yet more bad SHA1 (commiting lines from prod)Bryan Newbold2021-01-051-0/+20
* many bad PDF sha1 from prodBryan Newbold2020-11-061-0/+36
* differential wayback-error from wayback-content-errorBryan Newbold2020-10-211-1/+0
* and another sha1Bryan Newbold2020-10-131-0/+1
* another day, another bad PDF sha1Bryan Newbold2020-10-131-0/+1
* another bad PDF sha1Bryan Newbold2020-10-111-0/+1
* yet more bad sha1 PDFs to skipBryan Newbold2020-10-101-0/+20
* more bad PDF sha1Bryan Newbold2020-09-171-0/+2
* yet another broken PDF (sha1)Bryan Newbold2020-09-161-0/+1
* more bad SHA1 PDFBryan Newbold2020-09-021-0/+2
* another bad PDF sha1Bryan Newbold2020-09-011-0/+1
* another bad PDF sha1Bryan Newbold2020-08-241-0/+1
* another bad PDF sha1Bryan Newbold2020-08-171-0/+1
* another bad PDF sha1Bryan Newbold2020-08-151-0/+1
* more bad sha1Bryan Newbold2020-08-141-0/+1
* yet more bad PDF sha1Bryan Newbold2020-08-141-0/+2
* more bad SHA1Bryan Newbold2020-08-131-0/+2
* yet another PDF sha1Bryan Newbold2020-08-121-0/+1
* another bad sha1; maybe the last for this batch?Bryan Newbold2020-08-121-0/+1
* more bad sha1Bryan Newbold2020-08-111-0/+2
* more SHA1Bryan Newbold2020-08-111-0/+2
* more bad sha1Bryan Newbold2020-08-101-0/+2
* another bad PDF sha1Bryan Newbold2020-08-101-0/+1
* another PDF hash to skipBryan Newbold2020-08-081-0/+1
* another sha1Bryan Newbold2020-08-071-0/+1
* another sha1Bryan Newbold2020-08-061-0/+1
* and more bad sha1Bryan Newbold2020-08-061-0/+3
* more pdfextract skip sha1hexBryan Newbold2020-08-061-9/+12