Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | location comes as a string, not list | Bryan Newbold | 2020-01-09 | 1 | -1/+1 | |
| | ||||||
* | fix http/https issue with GlobalWayback library | Bryan Newbold | 2020-01-09 | 1 | -1/+2 | |
| | ||||||
* | wayback fetch via replay; confirm hashes in crawl_resource() | Bryan Newbold | 2020-01-09 | 1 | -5/+40 | |
| | ||||||
* | wrap up basic (locally testable) ingest refactor | Bryan Newbold | 2020-01-09 | 1 | -19/+23 | |
| | ||||||
* | more wayback and SPN tests and fixes | Bryan Newbold | 2020-01-09 | 1 | -38/+152 | |
| | ||||||
* | refactor CdxApiClient, add tests | Bryan Newbold | 2020-01-08 | 1 | -40/+130 | |
| | | | | | | - always use auth token and get full CDX rows - simplify to "fetch" (exact url/dt match) and "lookup_best" methods - all redirect stuff will be moved to a higher level | |||||
* | refactor SavePaperNowClient and add test | Bryan Newbold | 2020-01-07 | 1 | -28/+154 | |
| | | | | | | - response as a namedtuple - "remote" errors (aka, SPN API was HTTP 200 but returned error) aren't an exception | |||||
* | remove SPNv1 code paths | Bryan Newbold | 2020-01-07 | 1 | -35/+1 | |
| | ||||||
* | handle SPNv1 redirect loop | Bryan Newbold | 2019-11-14 | 1 | -0/+2 | |
| | ||||||
* | handle SPNv2 polling timeout | Bryan Newbold | 2019-11-14 | 1 | -6/+10 | |
| | ||||||
* | status_forcelist is on session, not request | Bryan Newbold | 2019-11-13 | 1 | -2/+2 | |
| | ||||||
* | handle SPNv1 remote server HTTP status codes better | Bryan Newbold | 2019-11-13 | 1 | -8/+15 | |
| | ||||||
* | handle requests (http) redirect loop from wayback | Bryan Newbold | 2019-11-13 | 1 | -1/+4 | |
| | ||||||
* | clean up redirect-following CDX API path | Bryan Newbold | 2019-11-13 | 1 | -8/+15 | |
| | ||||||
* | have SPN client differentiate between SPN and remote errors | Bryan Newbold | 2019-11-13 | 1 | -2/+10 | |
| | | | | | | | | This is only a partial implementation. The requests client will still make way too many SPN requests trying to figure out if this is a real error or not (eg, if remote was a 502, we'll retry many times). We may just want to switch to SPNv2 for everything. | |||||
* | more progress on file ingest | Bryan Newbold | 2019-11-13 | 1 | -6/+17 | |
| | ||||||
* | much progress on file ingest path | Bryan Newbold | 2019-10-22 | 1 | -15/+73 | |
| | ||||||
* | lots of grobid tool implementation (still WIP) | Bryan Newbold | 2019-09-26 | 1 | -0/+135 | |