aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/pdfextract.py
Commit message (Expand)AuthorAgeFilesLines
...
* pdfextract support in ingest workerBryan Newbold2020-06-251-0/+24
* poppler: correct RGBA buffer endian-nessBryan Newbold2020-06-251-1/+1
* pdfextract_tool fixes from prod usageBryan Newbold2020-06-251-1/+1
* pdfextract: fix pdf_extra key namesBryan Newbold2020-06-251-2/+2
* ensure pdf_meta isn't passed an empty dict()Bryan Newbold2020-06-251-1/+4
* fixes and tweaks from testing locallyBryan Newbold2020-06-171-3/+64
* make process_pdf() more robust to parse errorsBryan Newbold2020-06-171-5/+29
* note about text layout with pdf extractionBryan Newbold2020-06-171-0/+8
* rename pdf tools to pdfextractBryan Newbold2020-06-171-0/+167