aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/workers.py
Commit message (Expand)AuthorAgeFilesLines
* improvements to reliability from prod testingBryan Newbold2020-02-031-2/+9
* hack-y backoff ingest attemptBryan Newbold2020-02-031-1/+15
* worker kafka setting tweaksBryan Newbold2020-01-281-2/+4
* workers: yes, poll is necessaryBryan Newbold2020-01-281-1/+1
* fix kafka worker partition-specific errorBryan Newbold2020-01-281-1/+1
* have JsonLinePusher continue on JSON decode errors (but count)Bryan Newbold2020-01-021-1/+5
* refactor: use print(..., file=sys.stderr)Bryan Newbold2019-12-181-20/+22
* CI: make some jobs manualBryan Newbold2019-11-151-0/+2
* bump kafka max poll interval for consumersBryan Newbold2019-11-141-2/+2
* update ingest-file batch size to 1Bryan Newbold2019-11-141-3/+3
* refactor consume_topic name out of make_kafka_consumer()Bryan Newbold2019-11-131-5/+5
* workers: better generic batch-size arg handlingBryan Newbold2019-10-031-0/+6
* more counts and bugfixes in grobid_toolBryan Newbold2019-09-261-0/+6
* off-by-one error in batch sizesBryan Newbold2019-09-261-1/+1
* lots of grobid tool implementation (still WIP)Bryan Newbold2019-09-261-0/+419