aboutsummaryrefslogtreecommitdiffstats
BranchCommit messageAuthorAge
masterpytest: skip warning in gwbBryan Newbold23 months
bnewbold-refactor-logggingWIP: refactor logging calls in ingest pipelinesBryan Newbold2 years
trawlernotes on re-GROBID-ing (and re-extracting) some filesBryan Newbold3 years
bnewbold-persist-grobid-errorsgrobid persist: if status_code is not set, default to 0Bryan Newbold5 years
bnewbold-argsmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
bnewbold-backfillmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
 
 
AgeCommit messageAuthorFilesLines
2022-07-12WIP: refactor logging calls in ingest pipelinesbnewbold-refactor-logggingBryan Newbold6-114/+89
2022-07-07ingest: targeted 2022-04 notesBryan Newbold1-1/+3
2022-07-07stats: may 2022 ingest-by-domain statsBryan Newbold1-0/+410
2022-07-07ingest: IEEE domain is blocking usBryan Newbold1-1/+2
2022-05-16ingest: catch more ConnectionErrors (SPN, replay fetch, GROBID)Bryan Newbold2-4/+19
2022-05-11ingest: skip arxiv.org DOIs, we already direct-ingestBryan Newbold1-0/+1
2022-05-05python make fmtBryan Newbold1-3/+1
2022-05-05ingest spn2: fix testsBryan Newbold4-6/+108
2022-05-05ingest: more loginwall patternsBryan Newbold1-0/+3
2022-05-03ingest_tool: fix arg parsingBryan Newbold1-8/+8
[...]
 
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler