index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Branch
Commit message
Author
Age
master
pytest: skip warning in gwb
Bryan Newbold
23 months
bnewbold-refactor-loggging
WIP: refactor logging calls in ingest pipelines
Bryan Newbold
2 years
trawler
notes on re-GROBID-ing (and re-extracting) some files
Bryan Newbold
3 years
bnewbold-persist-grobid-errors
grobid persist: if status_code is not set, default to 0
Bryan Newbold
5 years
bnewbold-args
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
7 years
bnewbold-backfill
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
7 years
Age
Commit message
Author
Files
Lines
2023-01-04
pytest: skip warning in gwb
HEAD
master
Bryan Newbold
1
-0
/
+1
2023-01-04
mypy lint fixes
Bryan Newbold
4
-5
/
+5
2023-01-02
proposals: update status; include some brainstorm-only docs
Bryan Newbold
10
-25
/
+62
2023-01-02
python-specific README file
Bryan Newbold
3
-7
/
+48
2022-12-23
bump python deps
Bryan Newbold
2
-685
/
+700
2022-12-23
move a bunch of top-level files/directories to ./extra/
Bryan Newbold
13
-0
/
+0
2022-12-23
move top-level RFC to proposals dir
Bryan Newbold
1
-0
/
+0
2022-12-23
update README for Dec 2022
Bryan Newbold
1
-24
/
+36
2022-12-23
old notes on possible places to ingest from
Bryan Newbold
1
-0
/
+15
2022-12-23
old notes on domains to ingest from
Bryan Newbold
1
-0
/
+294
[...]
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler