index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Expand
)
Author
Age
Files
Lines
*
more sentry tags when extracting
Bryan Newbold
2018-12-03
1
-1
/
+6
*
improvements to Kafka GROBID worker logging
Bryan Newbold
2018-12-03
2
-11
/
+22
*
work around kafka topic/group mistakes
Bryan Newbold
2018-12-01
1
-1
/
+1
*
fix error var typo
Bryan Newbold
2018-11-27
1
-1
/
+1
*
catch more wayback error types
Bryan Newbold
2018-11-26
1
-1
/
+11
*
only pylint fail on errors
Bryan Newbold
2018-11-25
1
-1
/
+1
*
fix ungrobid extraction tests
Bryan Newbold
2018-11-22
1
-2
/
+4
*
more kafka/grobid notes
Bryan Newbold
2018-11-21
1
-0
/
+12
*
better default consumergroup name
Bryan Newbold
2018-11-21
1
-1
/
+1
*
many improvements to kafka HBase inserter
Bryan Newbold
2018-11-21
1
-29
/
+29
*
cherry-pick: correct HBase column filtering
Bryan Newbold
2018-11-21
1
-1
/
+1
*
fixes to hbase worker
Bryan Newbold
2018-11-21
1
-1
/
+13
*
fix kafka grobid command line topic parsing
Bryan Newbold
2018-11-21
2
-3
/
+9
*
kafka_grobid_hbase (not 'ed')
Bryan Newbold
2018-11-21
1
-0
/
+0
*
kafka_grobid fixes and hbase WIP
Bryan Newbold
2018-11-21
2
-2
/
+179
*
small kafka_grobid tweaks
Bryan Newbold
2018-11-21
1
-1
/
+2
*
updated Pipfile.lock (VERY SLOW)
Bryan Newbold
2018-11-21
1
-548
/
+431
*
kafka_grobid tweaks for deployment; delay insert decision
Bryan Newbold
2018-11-21
1
-21
/
+9
*
rename grobided to grobid-output
Bryan Newbold
2018-11-21
2
-3
/
+7
*
initial work on kafka_grobid worker
Bryan Newbold
2018-11-20
3
-0
/
+314
*
kafka notes
Bryan Newbold
2018-11-20
2
-0
/
+110
*
fix typos in DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-2
/
+2
*
one more lint ignore
Bryan Newbold
2018-10-30
1
-1
/
+1
*
squelch some more lint warnings
Bryan Newbold
2018-10-30
1
-1
/
+1
*
several bugs and lint issues in import_grobid_metadata
Bryan Newbold
2018-10-30
1
-9
/
+10
*
please support DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-0
/
+24
*
quick and dirty GROBID XML dumper
Bryan Newbold
2018-10-30
1
-0
/
+41
*
some progress on a crude grobid metadata filter
Bryan Newbold
2018-09-26
2
-7
/
+151
*
longtail grobid metadata parse/filter WIP
Bryan Newbold
2018-09-22
3
-0
/
+114
*
please support for DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+24
*
new DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+38
*
match_filter_enrich: fix typo
Bryan Newbold
2018-09-22
1
-1
/
+1
*
fix sha1/doi_list confusion in filter_scored_matches
Bryan Newbold
2018-09-22
1
-2
/
+2
*
pylint can be insufferable
Bryan Newbold
2018-09-20
1
-1
/
+1
*
gitignore in python dir
Bryan Newbold
2018-09-18
1
-0
/
+3
*
pass more pylint
Bryan Newbold
2018-09-18
3
-24
/
+23
*
fix typo in python/README
Bryan Newbold
2018-09-17
1
-1
/
+1
*
more robust extraction code (against petabox failures)
Bryan Newbold
2018-09-17
2
-2
/
+20
*
filter_scored_matches: fix tests
Bryan Newbold
2018-09-17
1
-2
/
+7
*
match and enrich notes+script
Bryan Newbold
2018-09-14
2
-0
/
+64
*
add manifest sqlite3 -> JSON converter
Bryan Newbold
2018-09-14
1
-0
/
+58
*
filter_scored_matches.py
Bryan Newbold
2018-09-13
1
-0
/
+110
*
dumpfilemeta support in please
Bryan Newbold
2018-09-13
1
-0
/
+24
*
new simple file metadata dump script
Bryan Newbold
2018-09-13
1
-0
/
+36
*
TODO updates
Bryan Newbold
2018-09-12
1
-0
/
+11
*
insertable flag for match-crossref
Bryan Newbold
2018-09-12
1
-1
/
+9
*
hack scorejob variant with extra context joined in
Bryan Newbold
2018-09-12
2
-0
/
+348
*
blacklist -> denylist
Bryan Newbold
2018-09-05
5
-12
/
+12
*
Merge branch 'ellen-none-refactor' into 'master'
bnewbold
2018-09-05
11
-154
/
+179
|
\
|
*
Merge branch 'ellen-none-refactor' of git.archive.org:webgroup/sandcrawler in...
Ellen Spertus
2018-09-04
0
-0
/
+0
|
|
\
[next]