index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Branch
Commit message
Author
Age
master
pytest: skip warning in gwb
Bryan Newbold
23 months
bnewbold-refactor-loggging
WIP: refactor logging calls in ingest pipelines
Bryan Newbold
2 years
trawler
notes on re-GROBID-ing (and re-extracting) some files
Bryan Newbold
3 years
bnewbold-persist-grobid-errors
grobid persist: if status_code is not set, default to 0
Bryan Newbold
5 years
bnewbold-args
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
6 years
bnewbold-backfill
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
6 years
Age
Commit message
Author
Files
Lines
2020-01-28
grobid persist: if status_code is not set, default to 0
bnewbold-persist-grobid-errors
Bryan Newbold
3
-7
/
+3
2020-01-28
sql stats: typo fix
Bryan Newbold
1
-1
/
+1
2020-01-28
sql howto: database dumps
Bryan Newbold
1
-0
/
+7
2020-01-28
workers: yes, poll is necessary
Bryan Newbold
1
-1
/
+1
2020-01-28
grobid worker: always set a key in response
Bryan Newbold
1
-4
/
+25
2020-01-28
fix kafka worker partition-specific error
Bryan Newbold
1
-1
/
+1
2020-01-28
fix WaybackError exception formating
Bryan Newbold
1
-1
/
+1
2020-01-28
fix elif syntax error
Bryan Newbold
1
-1
/
+1
2020-01-28
block springer page-one domain
Bryan Newbold
1
-0
/
+3
2020-01-28
clarify petabox fetch behavior
Bryan Newbold
1
-3
/
+6
[...]
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler