index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Branch
Commit message
Author
Age
master
pytest: skip warning in gwb
Bryan Newbold
23 months
bnewbold-refactor-loggging
WIP: refactor logging calls in ingest pipelines
Bryan Newbold
2 years
trawler
notes on re-GROBID-ing (and re-extracting) some files
Bryan Newbold
3 years
bnewbold-persist-grobid-errors
grobid persist: if status_code is not set, default to 0
Bryan Newbold
5 years
bnewbold-args
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
6 years
bnewbold-backfill
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
6 years
Age
Commit message
Author
Files
Lines
2022-07-12
WIP: refactor logging calls in ingest pipelines
bnewbold-refactor-loggging
Bryan Newbold
6
-114
/
+89
2022-07-07
ingest: targeted 2022-04 notes
Bryan Newbold
1
-1
/
+3
2022-07-07
stats: may 2022 ingest-by-domain stats
Bryan Newbold
1
-0
/
+410
2022-07-07
ingest: IEEE domain is blocking us
Bryan Newbold
1
-1
/
+2
2022-05-16
ingest: catch more ConnectionErrors (SPN, replay fetch, GROBID)
Bryan Newbold
2
-4
/
+19
2022-05-11
ingest: skip arxiv.org DOIs, we already direct-ingest
Bryan Newbold
1
-0
/
+1
2022-05-05
python make fmt
Bryan Newbold
1
-3
/
+1
2022-05-05
ingest spn2: fix tests
Bryan Newbold
4
-6
/
+108
2022-05-05
ingest: more loginwall patterns
Bryan Newbold
1
-0
/
+3
2022-05-03
ingest_tool: fix arg parsing
Bryan Newbold
1
-8
/
+8
[...]
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler