Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pdftrio basic python code | Bryan Newbold | 2020-02-12 | 1 | -1/+2 |
| | | | | This is basically just a copy/paste of GROBID code, only simpler! | ||||
* | small fixups to SandcrawlerPostgrestClient | Bryan Newbold | 2020-01-14 | 1 | -0/+1 |
| | |||||
* | more wayback and SPN tests and fixes | Bryan Newbold | 2020-01-09 | 1 | -1/+1 |
| | |||||
* | fix sandcrawler persist workers | Bryan Newbold | 2020-01-02 | 1 | -0/+1 |
| | |||||
* | have SPN client differentiate between SPN and remote errors | Bryan Newbold | 2019-11-13 | 1 | -1/+1 |
| | | | | | | | | This is only a partial implementation. The requests client will still make way too many SPN requests trying to figure out if this is a real error or not (eg, if remote was a 502, we'll retry many times). We may just want to switch to SPNv2 for everything. | ||||
* | rename FileIngestWorker | Bryan Newbold | 2019-11-13 | 1 | -0/+1 |
| | |||||
* | lots of grobid tool implementation (still WIP) | Bryan Newbold | 2019-09-26 | 1 | -1/+4 |
| | |||||
* | re-write parse_cdx_line for sandcrawler lib | Bryan Newbold | 2019-09-25 | 1 | -1/+1 |
| | |||||
* | start refactoring sandcrawler python common code | Bryan Newbold | 2019-09-23 | 1 | -0/+3 |