aboutsummaryrefslogtreecommitdiffstats
BranchCommit messageAuthorAge
masterpytest: skip warning in gwbBryan Newbold16 months
bnewbold-refactor-logggingWIP: refactor logging calls in ingest pipelinesBryan Newbold22 months
trawlernotes on re-GROBID-ing (and re-extracting) some filesBryan Newbold2 years
bnewbold-persist-grobid-errorsgrobid persist: if status_code is not set, default to 0Bryan Newbold4 years
bnewbold-argsmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
bnewbold-backfillmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
 
 
AgeCommit messageAuthorFilesLines
2020-01-28grobid persist: if status_code is not set, default to 0bnewbold-persist-grobid-errorsBryan Newbold3-7/+3
2020-01-28sql stats: typo fixBryan Newbold1-1/+1
2020-01-28sql howto: database dumpsBryan Newbold1-0/+7
2020-01-28workers: yes, poll is necessaryBryan Newbold1-1/+1
2020-01-28grobid worker: always set a key in responseBryan Newbold1-4/+25
2020-01-28fix kafka worker partition-specific errorBryan Newbold1-1/+1
2020-01-28fix WaybackError exception formatingBryan Newbold1-1/+1
2020-01-28fix elif syntax errorBryan Newbold1-1/+1
2020-01-28block springer page-one domainBryan Newbold1-0/+3
2020-01-28clarify petabox fetch behaviorBryan Newbold1-3/+6
[...]
 
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler