aboutsummaryrefslogtreecommitdiffstats
BranchCommit messageAuthorAge
masterpytest: skip warning in gwbBryan Newbold15 months
bnewbold-refactor-logggingWIP: refactor logging calls in ingest pipelinesBryan Newbold21 months
trawlernotes on re-GROBID-ing (and re-extracting) some filesBryan Newbold2 years
bnewbold-persist-grobid-errorsgrobid persist: if status_code is not set, default to 0Bryan Newbold4 years
bnewbold-argsmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
bnewbold-backfillmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
 
 
AgeCommit messageAuthorFilesLines
2023-01-04pytest: skip warning in gwbHEADmasterBryan Newbold1-0/+1
2023-01-04mypy lint fixesBryan Newbold4-5/+5
2023-01-02proposals: update status; include some brainstorm-only docsBryan Newbold10-25/+62
2023-01-02python-specific README fileBryan Newbold3-7/+48
2022-12-23bump python depsBryan Newbold2-685/+700
2022-12-23move a bunch of top-level files/directories to ./extra/Bryan Newbold13-0/+0
2022-12-23move top-level RFC to proposals dirBryan Newbold1-0/+0
2022-12-23update README for Dec 2022Bryan Newbold1-24/+36
2022-12-23old notes on possible places to ingest fromBryan Newbold1-0/+15
2022-12-23old notes on domains to ingest fromBryan Newbold1-0/+294
[...]
 
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler