aboutsummaryrefslogtreecommitdiffstats
BranchCommit messageAuthorAge
masterpytest: skip warning in gwbBryan Newbold23 months
bnewbold-refactor-logggingWIP: refactor logging calls in ingest pipelinesBryan Newbold2 years
trawlernotes on re-GROBID-ing (and re-extracting) some filesBryan Newbold3 years
bnewbold-persist-grobid-errorsgrobid persist: if status_code is not set, default to 0Bryan Newbold5 years
bnewbold-argsmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
bnewbold-backfillmake hbase_table and zookeeper_hosts CLI argsBryan Newbold6 years
 
 
AgeCommit messageAuthorFilesLines
2021-12-09notes on re-GROBID-ing (and re-extracting) some filestrawlerBryan Newbold1-0/+289
2021-12-07grobid: set a maximum file size (256 MByte)Bryan Newbold1-0/+8
2021-12-07worker: add kafka_group_suffix optionBryan Newbold1-3/+19
2021-12-07ingest tool: allow configuration of GROBID endpointBryan Newbold1-0/+7
2021-12-072021-12-02 database table size statsBryan Newbold1-0/+22
2021-12-07sandcrawler SQL dump and upload updatesBryan Newbold1-4/+12
2021-12-07update fatcat_file SQL table schema, and add backfill notesBryan Newbold1-1/+3
2021-12-01update fatcat_file SQL table schema, and add backfill notesBryan Newbold1-0/+13
2021-12-01commit old patch crawl notesBryan Newbold1-0/+488
2021-12-01Revert "pipenv: update deps"Bryan Newbold2-574/+382
[...]
 
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler