aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/workers.py
Commit message (Expand)AuthorAgeFilesLines
* Revert "reimplement worker timeout with multiprocessing"Bryan Newbold2020-10-221-17/+23
* reimplement worker timeout with multiprocessingBryan Newbold2020-10-221-23/+17
* differential wayback-error from wayback-content-errorBryan Newbold2020-10-211-3/+3
* customize timeout per worker; 120sec for pdf-extractBryan Newbold2020-06-291-1/+2
* handle empty fetched blobBryan Newbold2020-06-271-1/+6
* CDX KeyError as WaybackError from fetch workerBryan Newbold2020-06-261-1/+1
* don't nest generic fetch errors under pdf_trioBryan Newbold2020-06-251-12/+6
* fixes and tweaks from testing locallyBryan Newbold2020-06-171-2/+2
* workers: refactor to pass key to process()Bryan Newbold2020-06-171-7/+15
* refactor worker fetch code into wrapper classBryan Newbold2020-06-161-1/+88
* rename KafkaGrobidSink -> KafkaCompressSinkBryan Newbold2020-06-161-1/+1
* workers: add missing want() dataflow pathBryan Newbold2020-04-301-0/+9
* timeouts: don't push through None error messagesBryan Newbold2020-04-291-2/+2
* worker timeout wrapper, and use for kafkaBryan Newbold2020-04-271-2/+40
* batch/multiprocess for ZipfilePusherBryan Newbold2020-04-161-3/+18
* workers: add explicit process to base classMartin Czygan2020-03-121-0/+6
* improvements to reliability from prod testingBryan Newbold2020-02-031-2/+9
* hack-y backoff ingest attemptBryan Newbold2020-02-031-1/+15
* worker kafka setting tweaksBryan Newbold2020-01-281-2/+4
* workers: yes, poll is necessaryBryan Newbold2020-01-281-1/+1
* fix kafka worker partition-specific errorBryan Newbold2020-01-281-1/+1
* have JsonLinePusher continue on JSON decode errors (but count)Bryan Newbold2020-01-021-1/+5
* refactor: use print(..., file=sys.stderr)Bryan Newbold2019-12-181-20/+22
* CI: make some jobs manualBryan Newbold2019-11-151-0/+2
* bump kafka max poll interval for consumersBryan Newbold2019-11-141-2/+2
* update ingest-file batch size to 1Bryan Newbold2019-11-141-3/+3
* refactor consume_topic name out of make_kafka_consumer()Bryan Newbold2019-11-131-5/+5
* workers: better generic batch-size arg handlingBryan Newbold2019-10-031-0/+6
* more counts and bugfixes in grobid_toolBryan Newbold2019-09-261-0/+6
* off-by-one error in batch sizesBryan Newbold2019-09-261-1/+1
* lots of grobid tool implementation (still WIP)Bryan Newbold2019-09-261-0/+419