index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
notes
/
tasks
Commit message (
Collapse
)
Author
Age
Files
Lines
*
enqueue PLATFORM PDFs for crawl
Bryan Newbold
2022-01-07
1
-0
/
+23
|
*
document progress on re-GROBID-ing
Bryan Newbold
2022-01-05
1
-0
/
+89
|
*
notes on re-GROBID-ing (and re-extracting) some files
trawler
Bryan Newbold
2021-12-09
1
-0
/
+289
|
*
wrap up crossref refs backfill notes
Bryan Newbold
2021-11-10
1
-0
/
+47
|
*
update crossref/grobid refs generation notes
Bryan Newbold
2021-11-04
1
-4
/
+96
|
*
grobid refs backfill progress
Bryan Newbold
2021-11-04
1
-1
/
+43
|
*
start notes on crossref refs backfill
Bryan Newbold
2021-11-04
1
-0
/
+54
|
*
old (2020) notes on pdfextract cleanup
Bryan Newbold
2021-10-04
1
-0
/
+74
|
*
notes on dumping PDF URL lists for partners
Bryan Newbold
2021-10-04
1
-0
/
+66
|
*
notes on file_meta task (from august)
Bryan Newbold
2020-10-01
1
-0
/
+66
|
*
follow-up notes on processing 'holes'
Bryan Newbold
2020-09-02
1
-0
/
+19
|
*
grobid+pdftext missing catch-up commands
Bryan Newbold
2020-08-05
1
-0
/
+101
|
*
commit old notes on a one-off CDX table cleanup
Bryan Newbold
2020-06-25
1
-0
/
+34
|
*
commit old (2020-02) pdftrio commands
Bryan Newbold
2020-06-25
1
-0
/
+162
|
*
update (and move) ingest notes
Bryan Newbold
2020-03-03
3
-294
/
+0
|
*
ingest backfill notes
Bryan Newbold
2020-02-24
3
-0
/
+150
|
*
add notes on recent ingest and backfill tasks
Bryan Newbold
2020-02-05
3
-0
/
+221