index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Expand
)
Author
Age
Files
Lines
*
add GWB-to-S3 delivery pipeline script
Bryan Newbold
2019-02-19
2
-0
/
+162
*
give sort way more RAM by default
Bryan Newbold
2019-02-01
3
-6
/
+6
*
update (internal) journal-infra link
Bryan Newbold
2019-01-03
1
-1
/
+1
*
match_filter_enrich notes
Bryan Newbold
2019-01-03
1
-0
/
+12
*
remove old/redundant python CDX directory
Bryan Newbold
2019-01-03
3
-103
/
+0
*
use ipv4 localhost with kafkacat
Bryan Newbold
2018-12-19
1
-1
/
+1
*
notes on file-level metadata dump
Bryan Newbold
2018-12-19
1
-0
/
+31
*
longer match-crossref timeout
Bryan Newbold
2018-12-18
1
-2
/
+3
*
update notes
Bryan Newbold
2018-12-10
3
-1
/
+59
*
crank hbase GROBID worker memory usage down
Bryan Newbold
2018-12-10
1
-1
/
+1
*
increase message size (kafka-grobid-hbase)
Bryan Newbold
2018-12-10
1
-0
/
+2
*
add python-snappy dep
Bryan Newbold
2018-12-10
2
-84
/
+96
*
ah, right, it's more like extract/3sec, not 30sec
Bryan Newbold
2018-12-03
1
-4
/
+4
*
tweak grobid worker producer settings
Bryan Newbold
2018-12-03
1
-2
/
+2
*
tweak kafka config significantly
Bryan Newbold
2018-12-03
2
-3
/
+18
*
more sentry tags when extracting
Bryan Newbold
2018-12-03
1
-1
/
+6
*
improvements to Kafka GROBID worker logging
Bryan Newbold
2018-12-03
2
-11
/
+22
*
work around kafka topic/group mistakes
Bryan Newbold
2018-12-01
1
-1
/
+1
*
fix error var typo
Bryan Newbold
2018-11-27
1
-1
/
+1
*
catch more wayback error types
Bryan Newbold
2018-11-26
1
-1
/
+11
*
only pylint fail on errors
Bryan Newbold
2018-11-25
1
-1
/
+1
*
fix ungrobid extraction tests
Bryan Newbold
2018-11-22
1
-2
/
+4
*
more kafka/grobid notes
Bryan Newbold
2018-11-21
1
-0
/
+12
*
better default consumergroup name
Bryan Newbold
2018-11-21
1
-1
/
+1
*
many improvements to kafka HBase inserter
Bryan Newbold
2018-11-21
1
-29
/
+29
*
cherry-pick: correct HBase column filtering
Bryan Newbold
2018-11-21
1
-1
/
+1
*
fixes to hbase worker
Bryan Newbold
2018-11-21
1
-1
/
+13
*
fix kafka grobid command line topic parsing
Bryan Newbold
2018-11-21
2
-3
/
+9
*
kafka_grobid_hbase (not 'ed')
Bryan Newbold
2018-11-21
1
-0
/
+0
*
kafka_grobid fixes and hbase WIP
Bryan Newbold
2018-11-21
2
-2
/
+179
*
small kafka_grobid tweaks
Bryan Newbold
2018-11-21
1
-1
/
+2
*
updated Pipfile.lock (VERY SLOW)
Bryan Newbold
2018-11-21
1
-548
/
+431
*
kafka_grobid tweaks for deployment; delay insert decision
Bryan Newbold
2018-11-21
1
-21
/
+9
*
rename grobided to grobid-output
Bryan Newbold
2018-11-21
2
-3
/
+7
*
initial work on kafka_grobid worker
Bryan Newbold
2018-11-20
3
-0
/
+314
*
kafka notes
Bryan Newbold
2018-11-20
2
-0
/
+110
*
fix typos in DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-2
/
+2
*
one more lint ignore
Bryan Newbold
2018-10-30
1
-1
/
+1
*
squelch some more lint warnings
Bryan Newbold
2018-10-30
1
-1
/
+1
*
several bugs and lint issues in import_grobid_metadata
Bryan Newbold
2018-10-30
1
-9
/
+10
*
please support DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-0
/
+24
*
quick and dirty GROBID XML dumper
Bryan Newbold
2018-10-30
1
-0
/
+41
*
some progress on a crude grobid metadata filter
Bryan Newbold
2018-09-26
2
-7
/
+151
*
longtail grobid metadata parse/filter WIP
Bryan Newbold
2018-09-22
3
-0
/
+114
*
please support for DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+24
*
new DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+38
*
match_filter_enrich: fix typo
Bryan Newbold
2018-09-22
1
-1
/
+1
*
fix sha1/doi_list confusion in filter_scored_matches
Bryan Newbold
2018-09-22
1
-2
/
+2
*
pylint can be insufferable
Bryan Newbold
2018-09-20
1
-1
/
+1
*
gitignore in python dir
Bryan Newbold
2018-09-18
1
-0
/
+3
[next]