aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/pdftrio.py
Commit message (Collapse)AuthorAgeFilesLines
* differential wayback-error from wayback-content-errorBryan Newbold2020-10-211-1/+0
| | | | | | The motivation here is to distinguish errors due to current content in wayback (eg, in WARCs) from operational errors (eg, wayback machine is down, or network failures/disruption).
* workers: refactor to pass key to process()Bryan Newbold2020-06-171-2/+2
|
* refactor worker fetch code into wrapper classBryan Newbold2020-06-161-80/+14
|
* pdftrio: tweaks to avoid connection errorsBryan Newbold2020-02-241-1/+9
|
* unpaywall2ingestrequest transform scriptBryan Newbold2020-02-181-1/+1
|
* pdftrio: mode controlled by CLI argBryan Newbold2020-02-181-4/+5
|
* pdftrio: fix error nesting in pdftrio keyBryan Newbold2020-02-181-12/+20
|
* pdftrio fixes from testingBryan Newbold2020-02-131-3/+9
|
* move pdf_trio results back under key in JSON/KafkaBryan Newbold2020-02-131-6/+22
|
* pdftrio: small fixes from testingBryan Newbold2020-02-121-2/+2
|
* pdftrio basic python codeBryan Newbold2020-02-121-0/+158
This is basically just a copy/paste of GROBID code, only simpler!