Branch | Commit message | Author | Age |
master | pytest: skip warning in gwb | Bryan Newbold | 2 years |
bnewbold-refactor-loggging | WIP: refactor logging calls in ingest pipelines | Bryan Newbold | 3 years |
trawler | notes on re-GROBID-ing (and re-extracting) some files | Bryan Newbold | 3 years |
bnewbold-persist-grobid-errors | grobid persist: if status_code is not set, default to 0 | Bryan Newbold | 5 years |
bnewbold-args | make hbase_table and zookeeper_hosts CLI args | Bryan Newbold | 7 years |
bnewbold-backfill | make hbase_table and zookeeper_hosts CLI args | Bryan Newbold | 7 years |
|
|
Age | Commit message | Author | Files | Lines |
2022-07-12 | WIP: refactor logging calls in ingest pipelinesbnewbold-refactor-loggging | Bryan Newbold | 6 | -114/+89 |
2022-07-07 | ingest: targeted 2022-04 notes | Bryan Newbold | 1 | -1/+3 |
2022-07-07 | stats: may 2022 ingest-by-domain stats | Bryan Newbold | 1 | -0/+410 |
2022-07-07 | ingest: IEEE domain is blocking us | Bryan Newbold | 1 | -1/+2 |
2022-05-16 | ingest: catch more ConnectionErrors (SPN, replay fetch, GROBID) | Bryan Newbold | 2 | -4/+19 |
2022-05-11 | ingest: skip arxiv.org DOIs, we already direct-ingest | Bryan Newbold | 1 | -0/+1 |
2022-05-05 | python make fmt | Bryan Newbold | 1 | -3/+1 |
2022-05-05 | ingest spn2: fix tests | Bryan Newbold | 4 | -6/+108 |
2022-05-05 | ingest: more loginwall patterns | Bryan Newbold | 1 | -0/+3 |
2022-05-03 | ingest_tool: fix arg parsing | Bryan Newbold | 1 | -8/+8 |
[...] |
|
Clone |
git@git.bnewbold.net:sandcrawler |
https://git.bnewbold.net/sandcrawler |
git://git.bnewbold.net/sandcrawler |