index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
Commit message (
Expand
)
Author
Age
Files
Lines
*
crank hbase GROBID worker memory usage down
Bryan Newbold
2018-12-10
1
-1
/
+1
*
increase message size (kafka-grobid-hbase)
Bryan Newbold
2018-12-10
1
-0
/
+2
*
add python-snappy dep
Bryan Newbold
2018-12-10
2
-84
/
+96
*
ah, right, it's more like extract/3sec, not 30sec
Bryan Newbold
2018-12-03
1
-4
/
+4
*
tweak grobid worker producer settings
Bryan Newbold
2018-12-03
1
-2
/
+2
*
tweak kafka config significantly
Bryan Newbold
2018-12-03
2
-3
/
+18
*
more sentry tags when extracting
Bryan Newbold
2018-12-03
1
-1
/
+6
*
improvements to Kafka GROBID worker logging
Bryan Newbold
2018-12-03
2
-11
/
+22
*
work around kafka topic/group mistakes
Bryan Newbold
2018-12-01
1
-1
/
+1
*
fix error var typo
Bryan Newbold
2018-11-27
1
-1
/
+1
*
catch more wayback error types
Bryan Newbold
2018-11-26
1
-1
/
+11
*
fix ungrobid extraction tests
Bryan Newbold
2018-11-22
1
-2
/
+4
*
better default consumergroup name
Bryan Newbold
2018-11-21
1
-1
/
+1
*
many improvements to kafka HBase inserter
Bryan Newbold
2018-11-21
1
-29
/
+29
*
cherry-pick: correct HBase column filtering
Bryan Newbold
2018-11-21
1
-1
/
+1
*
fixes to hbase worker
Bryan Newbold
2018-11-21
1
-1
/
+13
*
fix kafka grobid command line topic parsing
Bryan Newbold
2018-11-21
2
-3
/
+9
*
kafka_grobid_hbase (not 'ed')
Bryan Newbold
2018-11-21
1
-0
/
+0
*
kafka_grobid fixes and hbase WIP
Bryan Newbold
2018-11-21
2
-2
/
+179
*
small kafka_grobid tweaks
Bryan Newbold
2018-11-21
1
-1
/
+2
*
updated Pipfile.lock (VERY SLOW)
Bryan Newbold
2018-11-21
1
-548
/
+431
*
kafka_grobid tweaks for deployment; delay insert decision
Bryan Newbold
2018-11-21
1
-21
/
+9
*
initial work on kafka_grobid worker
Bryan Newbold
2018-11-20
2
-0
/
+296
*
one more lint ignore
Bryan Newbold
2018-10-30
1
-1
/
+1
*
squelch some more lint warnings
Bryan Newbold
2018-10-30
1
-1
/
+1
*
several bugs and lint issues in import_grobid_metadata
Bryan Newbold
2018-10-30
1
-9
/
+10
*
some progress on a crude grobid metadata filter
Bryan Newbold
2018-09-26
2
-7
/
+151
*
longtail grobid metadata parse/filter WIP
Bryan Newbold
2018-09-22
3
-0
/
+114
*
fix sha1/doi_list confusion in filter_scored_matches
Bryan Newbold
2018-09-22
1
-2
/
+2
*
pylint can be insufferable
Bryan Newbold
2018-09-20
1
-1
/
+1
*
gitignore in python dir
Bryan Newbold
2018-09-18
1
-0
/
+3
*
pass more pylint
Bryan Newbold
2018-09-18
3
-24
/
+23
*
fix typo in python/README
Bryan Newbold
2018-09-17
1
-1
/
+1
*
more robust extraction code (against petabox failures)
Bryan Newbold
2018-09-17
2
-2
/
+20
*
filter_scored_matches: fix tests
Bryan Newbold
2018-09-17
1
-2
/
+7
*
match and enrich notes+script
Bryan Newbold
2018-09-14
1
-0
/
+45
*
add manifest sqlite3 -> JSON converter
Bryan Newbold
2018-09-14
1
-0
/
+58
*
filter_scored_matches.py
Bryan Newbold
2018-09-13
1
-0
/
+110
*
blacklist -> denylist
Bryan Newbold
2018-09-05
2
-8
/
+8
*
warning in python/README
Bryan Newbold
2018-09-04
1
-0
/
+22
*
update python TODO
Bryan Newbold
2018-08-27
1
-4
/
+1
*
finally got extraction_ungrobided to run in prod
Bryan Newbold
2018-08-26
1
-4
/
+7
*
WIP: ungrobided doesn't inherit (copypasta)
Bryan Newbold
2018-08-25
2
-7
/
+136
*
ungrobided: example real output
Bryan Newbold
2018-08-25
1
-0
/
+20
*
ungrobided: fix python call typo
Bryan Newbold
2018-08-25
1
-1
/
+1
*
disambiguration parse_line method
Bryan Newbold
2018-08-25
1
-3
/
+3
*
ungrobided: add real results to tests
Bryan Newbold
2018-08-25
1
-1
/
+51
*
python extraction_ungrobided job
Bryan Newbold
2018-08-24
3
-0
/
+288
*
update READMEs
Bryan Newbold
2018-08-24
1
-22
/
+30
*
rename ./mapreduce to ./python
Bryan Newbold
2018-08-24
21
-0
/
+4508