Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | persist grobid: add option to skip S3 upload | Bryan Newbold | 2020-03-19 | 1 | -0/+4 |
| | | | | | | | Motivation for this is that current S3 target (minio) is overloaded, with too many files on a single partition (80 million+). Going to look in to seaweedfs and other options, but for now stopping minio persist. Data is all stored in kafka anyways. | ||||
* | fixes to ingest-request persist | Bryan Newbold | 2020-03-05 | 1 | -1/+1 |
| | |||||
* | persist: ingest_request tool (with no ingest_file_result) | Bryan Newbold | 2020-03-05 | 1 | -0/+18 |
| | |||||
* | pdftrio basic python code | Bryan Newbold | 2020-02-12 | 1 | -0/+18 |
| | | | | This is basically just a copy/paste of GROBID code, only simpler! | ||||
* | improve sentry reporting with 'release' git hash | Bryan Newbold | 2020-01-15 | 1 | -1/+0 |
| | |||||
* | more ftp status 226 support | Bryan Newbold | 2020-01-14 | 1 | -1/+1 |
| | |||||
* | add PersistGrobidDiskWorker | Bryan Newbold | 2020-01-02 | 1 | -0/+27 |
| | | | | To help with making dumps directly from Kafka (eg, for partner delivery) | ||||
* | flush out minio helper, add to grobid persist | Bryan Newbold | 2020-01-02 | 1 | -2/+20 |
| | |||||
* | start work on persist workers and tool | Bryan Newbold | 2020-01-02 | 1 | -0/+98 |