aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/persist.py
Commit message (Collapse)AuthorAgeFilesLines
* ingest persist skips 'existing' ingest resultsBryan Newbold2020-01-141-0/+3
|
* handle grobid2json errors in calling code insteadBryan Newbold2020-01-021-1/+7
|
* db: move duplicate row filtering into DB insert helpersBryan Newbold2020-01-021-15/+1
|
* remove unused filter in grobid workerBryan Newbold2020-01-021-1/+0
|
* fix dict typoBryan Newbold2020-01-021-1/+1
|
* improvements to grobid persist workerBryan Newbold2020-01-021-13/+16
|
* filter ingest results to not have key conflicts within batchBryan Newbold2020-01-021-1/+16
| | | | | This handles a corner case with ON CONFLICT ... DO UPDATE where you can't do multiple such updates in the same batch transaction.
* db: fancy insert/update separation using postgres xmaxBryan Newbold2020-01-021-9/+15
|
* add PersistGrobidDiskWorkerBryan Newbold2020-01-021-0/+33
| | | | To help with making dumps directly from Kafka (eg, for partner delivery)
* flush out minio helper, add to grobid persistBryan Newbold2020-01-021-9/+29
|
* implement counts properly for persist workersBryan Newbold2020-01-021-15/+19
|
* start work on persist workers and toolBryan Newbold2020-01-021-0/+223