index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Branch
Commit message
Author
Age
master
pytest: skip warning in gwb
Bryan Newbold
23 months
bnewbold-refactor-loggging
WIP: refactor logging calls in ingest pipelines
Bryan Newbold
2 years
trawler
notes on re-GROBID-ing (and re-extracting) some files
Bryan Newbold
3 years
bnewbold-persist-grobid-errors
grobid persist: if status_code is not set, default to 0
Bryan Newbold
5 years
bnewbold-args
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
6 years
bnewbold-backfill
make hbase_table and zookeeper_hosts CLI args
Bryan Newbold
6 years
Age
Commit message
Author
Files
Lines
2021-12-09
notes on re-GROBID-ing (and re-extracting) some files
trawler
Bryan Newbold
1
-0
/
+289
2021-12-07
grobid: set a maximum file size (256 MByte)
Bryan Newbold
1
-0
/
+8
2021-12-07
worker: add kafka_group_suffix option
Bryan Newbold
1
-3
/
+19
2021-12-07
ingest tool: allow configuration of GROBID endpoint
Bryan Newbold
1
-0
/
+7
2021-12-07
2021-12-02 database table size stats
Bryan Newbold
1
-0
/
+22
2021-12-07
sandcrawler SQL dump and upload updates
Bryan Newbold
1
-4
/
+12
2021-12-07
update fatcat_file SQL table schema, and add backfill notes
Bryan Newbold
1
-1
/
+3
2021-12-01
update fatcat_file SQL table schema, and add backfill notes
Bryan Newbold
1
-0
/
+13
2021-12-01
commit old patch crawl notes
Bryan Newbold
1
-0
/
+488
2021-12-01
Revert "pipenv: update deps"
Bryan Newbold
2
-574
/
+382
[...]
Clone
git@git.bnewbold.net:sandcrawler
https://git.bnewbold.net/sandcrawler
git://git.bnewbold.net/sandcrawler