aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/pdftrio.py
Commit message (Collapse)AuthorAgeFilesLines
* mypy lint fixesBryan Newbold2023-01-041-2/+2
|
* codespell typos in python (comments)Bryan Newbold2021-11-241-1/+1
|
* pdftrio client: use HTTP session for POSTsBryan Newbold2021-11-031-1/+1
|
* make fmt (black 21.9b0)Bryan Newbold2021-10-271-36/+42
|
* fix type annotations for petabox body fetch helperBryan Newbold2021-10-261-1/+2
|
* more progress on type annotationsBryan Newbold2021-10-261-1/+1
|
* more progress on type annotations and lintingBryan Newbold2021-10-261-9/+20
|
* start handling trivial lint cleanups: unused imports, 'is None', etcBryan Newbold2021-10-261-1/+0
|
* make fmtBryan Newbold2021-10-261-8/+2
|
* python: isort all importsBryan Newbold2021-10-261-1/+2
|
* differential wayback-error from wayback-content-errorBryan Newbold2020-10-211-1/+0
| | | | | | The motivation here is to distinguish errors due to current content in wayback (eg, in WARCs) from operational errors (eg, wayback machine is down, or network failures/disruption).
* workers: refactor to pass key to process()Bryan Newbold2020-06-171-2/+2
|
* refactor worker fetch code into wrapper classBryan Newbold2020-06-161-80/+14
|
* pdftrio: tweaks to avoid connection errorsBryan Newbold2020-02-241-1/+9
|
* unpaywall2ingestrequest transform scriptBryan Newbold2020-02-181-1/+1
|
* pdftrio: mode controlled by CLI argBryan Newbold2020-02-181-4/+5
|
* pdftrio: fix error nesting in pdftrio keyBryan Newbold2020-02-181-12/+20
|
* pdftrio fixes from testingBryan Newbold2020-02-131-3/+9
|
* move pdf_trio results back under key in JSON/KafkaBryan Newbold2020-02-131-6/+22
|
* pdftrio: small fixes from testingBryan Newbold2020-02-121-2/+2
|
* pdftrio basic python codeBryan Newbold2020-02-121-0/+158
This is basically just a copy/paste of GROBID code, only simpler!