index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
notes
/
tasks
Commit message (
Expand
)
Author
Age
Files
Lines
*
PDF URL lists update
Bryan Newbold
2022-05-03
2
-0
/
+76
*
.ua crawling follow-up stats
Bryan Newbold
2022-04-26
1
-2
/
+2
*
.ua ingest notes
Bryan Newbold
2022-04-04
1
-0
/
+29
*
various ingest/task notes
Bryan Newbold
2022-03-22
1
-4
/
+4
*
partial notes on .ua urgent crawling
Bryan Newbold
2022-03-11
1
-0
/
+196
*
enqueue PLATFORM PDFs for crawl
Bryan Newbold
2022-01-07
1
-0
/
+23
*
document progress on re-GROBID-ing
Bryan Newbold
2022-01-05
1
-0
/
+89
*
notes on re-GROBID-ing (and re-extracting) some files
trawler
Bryan Newbold
2021-12-09
1
-0
/
+289
*
wrap up crossref refs backfill notes
Bryan Newbold
2021-11-10
1
-0
/
+47
*
update crossref/grobid refs generation notes
Bryan Newbold
2021-11-04
1
-4
/
+96
*
grobid refs backfill progress
Bryan Newbold
2021-11-04
1
-1
/
+43
*
start notes on crossref refs backfill
Bryan Newbold
2021-11-04
1
-0
/
+54
*
old (2020) notes on pdfextract cleanup
Bryan Newbold
2021-10-04
1
-0
/
+74
*
notes on dumping PDF URL lists for partners
Bryan Newbold
2021-10-04
1
-0
/
+66
*
notes on file_meta task (from august)
Bryan Newbold
2020-10-01
1
-0
/
+66
*
follow-up notes on processing 'holes'
Bryan Newbold
2020-09-02
1
-0
/
+19
*
grobid+pdftext missing catch-up commands
Bryan Newbold
2020-08-05
1
-0
/
+101
*
commit old notes on a one-off CDX table cleanup
Bryan Newbold
2020-06-25
1
-0
/
+34
*
commit old (2020-02) pdftrio commands
Bryan Newbold
2020-06-25
1
-0
/
+162
*
update (and move) ingest notes
Bryan Newbold
2020-03-03
3
-294
/
+0
*
ingest backfill notes
Bryan Newbold
2020-02-24
3
-0
/
+150
*
add notes on recent ingest and backfill tasks
Bryan Newbold
2020-02-05
3
-0
/
+221