aboutsummaryrefslogtreecommitdiffstats
path: root/notes/tasks
Commit message (Collapse)AuthorAgeFilesLines
* finished re-GROBID-ingBryan Newbold2022-05-031-5/+7
|
* PDF URL lists updateBryan Newbold2022-05-032-0/+76
|
* .ua crawling follow-up statsBryan Newbold2022-04-261-2/+2
|
* .ua ingest notesBryan Newbold2022-04-041-0/+29
|
* various ingest/task notesBryan Newbold2022-03-221-4/+4
|
* partial notes on .ua urgent crawlingBryan Newbold2022-03-111-0/+196
|
* enqueue PLATFORM PDFs for crawlBryan Newbold2022-01-071-0/+23
|
* document progress on re-GROBID-ingBryan Newbold2022-01-051-0/+89
|
* notes on re-GROBID-ing (and re-extracting) some filestrawlerBryan Newbold2021-12-091-0/+289
|
* wrap up crossref refs backfill notesBryan Newbold2021-11-101-0/+47
|
* update crossref/grobid refs generation notesBryan Newbold2021-11-041-4/+96
|
* grobid refs backfill progressBryan Newbold2021-11-041-1/+43
|
* start notes on crossref refs backfillBryan Newbold2021-11-041-0/+54
|
* old (2020) notes on pdfextract cleanupBryan Newbold2021-10-041-0/+74
|
* notes on dumping PDF URL lists for partnersBryan Newbold2021-10-041-0/+66
|
* notes on file_meta task (from august)Bryan Newbold2020-10-011-0/+66
|
* follow-up notes on processing 'holes'Bryan Newbold2020-09-021-0/+19
|
* grobid+pdftext missing catch-up commandsBryan Newbold2020-08-051-0/+101
|
* commit old notes on a one-off CDX table cleanupBryan Newbold2020-06-251-0/+34
|
* commit old (2020-02) pdftrio commandsBryan Newbold2020-06-251-0/+162
|
* update (and move) ingest notesBryan Newbold2020-03-033-294/+0
|
* ingest backfill notesBryan Newbold2020-02-243-0/+150
|
* add notes on recent ingest and backfill tasksBryan Newbold2020-02-053-0/+221