aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/__init__.py
Commit message (Collapse)AuthorAgeFilesLines
* pdftrio basic python codeBryan Newbold2020-02-121-1/+2
| | | | This is basically just a copy/paste of GROBID code, only simpler!
* small fixups to SandcrawlerPostgrestClientBryan Newbold2020-01-141-0/+1
|
* more wayback and SPN tests and fixesBryan Newbold2020-01-091-1/+1
|
* fix sandcrawler persist workersBryan Newbold2020-01-021-0/+1
|
* have SPN client differentiate between SPN and remote errorsBryan Newbold2019-11-131-1/+1
| | | | | | | | This is only a partial implementation. The requests client will still make way too many SPN requests trying to figure out if this is a real error or not (eg, if remote was a 502, we'll retry many times). We may just want to switch to SPNv2 for everything.
* rename FileIngestWorkerBryan Newbold2019-11-131-0/+1
|
* lots of grobid tool implementation (still WIP)Bryan Newbold2019-09-261-1/+4
|
* re-write parse_cdx_line for sandcrawler libBryan Newbold2019-09-251-1/+1
|
* start refactoring sandcrawler python common codeBryan Newbold2019-09-231-0/+3