aboutsummaryrefslogtreecommitdiffstats
path: root/python/kafka_grobid.py
Commit message (Collapse)AuthorAgeFilesLines
* remove deprecated kafka_grobid.py workerBryan Newbold2020-05-261-331/+0
| | | | | | All use of pykafka was refactored to use the confluent library some time ago. And all kafka workers have been using the newer sandcrawler style worker for some time.
* refactor: use print(..., file=sys.stderr)Bryan Newbold2019-12-181-2/+2
| | | | Should use logging soon, but this seems more idiomatic in the meanwhile.
* refactor: sort keys in JSON outputBryan Newbold2019-12-181-1/+1
| | | | This makes debugging by tailing Kafka topics a lot more readable
* refactor: improve argparse usageBryan Newbold2019-12-181-1/+2
| | | | | use ArgumentDefaultsHelpFormatter and add help messages to all sub-commands
* note that kafka_grobid.py is deprecatedBryan Newbold2019-11-131-0/+3
|
* python test fixesBryan Newbold2019-02-211-0/+1
|
* backport GWB fetch improvements to extraction/kafka workersBryan Newbold2019-02-211-4/+8
| | | | *Really* need to refactor out these common methods into a base class.
* ah, right, it's more like extract/3sec, not 30secBryan Newbold2018-12-031-4/+4
|
* tweak grobid worker producer settingsBryan Newbold2018-12-031-2/+2
| | | | | Python CPU utilization shot way up; this is an attempt to bring it back down.
* tweak kafka config significantlyBryan Newbold2018-12-031-3/+16
|
* more sentry tags when extractingBryan Newbold2018-12-031-1/+6
|
* improvements to Kafka GROBID worker loggingBryan Newbold2018-12-031-5/+11
|
* fix error var typoBryan Newbold2018-11-271-1/+1
|
* catch more wayback error typesBryan Newbold2018-11-261-1/+11
|
* better default consumergroup nameBryan Newbold2018-11-211-1/+1
|
* fix kafka grobid command line topic parsingBryan Newbold2018-11-211-2/+2
|
* kafka_grobid fixes and hbase WIPBryan Newbold2018-11-211-2/+6
|
* small kafka_grobid tweaksBryan Newbold2018-11-211-1/+2
|
* kafka_grobid tweaks for deployment; delay insert decisionBryan Newbold2018-11-211-21/+9
|
* initial work on kafka_grobid workerBryan Newbold2018-11-201-0/+295