Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | simple persist worker/tool to backfill grobid_refs | Bryan Newbold | 2021-11-10 | 1 | -0/+22 |
| | |||||
* | crossref persist: batch size depends on whether parsing refs | Bryan Newbold | 2021-11-04 | 1 | -1/+4 |
| | |||||
* | crossref persist: make GROBID ref parsing an option (not default) | Bryan Newbold | 2021-11-04 | 1 | -0/+6 |
| | |||||
* | glue, utils, and worker code for crossref and grobid_refs | Bryan Newbold | 2021-11-04 | 1 | -0/+30 |
| | |||||
* | make fmt (black 21.9b0) | Bryan Newbold | 2021-10-27 | 1 | -69/+109 |
| | |||||
* | make fmt | Bryan Newbold | 2021-10-26 | 1 | -63/+62 |
| | |||||
* | python: isort all imports | Bryan Newbold | 2021-10-26 | 1 | -1/+1 |
| | |||||
* | refactor 'minio' to 'seaweedfs'; and BLOB env vars | Bryan Newbold | 2020-11-03 | 1 | -9/+9 |
| | | | | | This goes along with changes to ansible deployment to use the correct key names and values. | ||||
* | lint fixes | Bryan Newbold | 2020-06-17 | 1 | -2/+1 |
| | |||||
* | add new pdf workers/persisters | Bryan Newbold | 2020-06-17 | 1 | -0/+30 |
| | |||||
* | persist grobid: add option to skip S3 upload | Bryan Newbold | 2020-03-19 | 1 | -0/+4 |
| | | | | | | | Motivation for this is that current S3 target (minio) is overloaded, with too many files on a single partition (80 million+). Going to look in to seaweedfs and other options, but for now stopping minio persist. Data is all stored in kafka anyways. | ||||
* | fixes to ingest-request persist | Bryan Newbold | 2020-03-05 | 1 | -1/+1 |
| | |||||
* | persist: ingest_request tool (with no ingest_file_result) | Bryan Newbold | 2020-03-05 | 1 | -0/+18 |
| | |||||
* | pdftrio basic python code | Bryan Newbold | 2020-02-12 | 1 | -0/+18 |
| | | | | This is basically just a copy/paste of GROBID code, only simpler! | ||||
* | improve sentry reporting with 'release' git hash | Bryan Newbold | 2020-01-15 | 1 | -1/+0 |
| | |||||
* | more ftp status 226 support | Bryan Newbold | 2020-01-14 | 1 | -1/+1 |
| | |||||
* | add PersistGrobidDiskWorker | Bryan Newbold | 2020-01-02 | 1 | -0/+27 |
| | | | | To help with making dumps directly from Kafka (eg, for partner delivery) | ||||
* | flush out minio helper, add to grobid persist | Bryan Newbold | 2020-01-02 | 1 | -2/+20 |
| | |||||
* | start work on persist workers and tool | Bryan Newbold | 2020-01-02 | 1 | -0/+98 |