Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | finished re-GROBID-ing | Bryan Newbold | 2022-05-03 | 1 | -5/+7 |
* | PDF URL lists update | Bryan Newbold | 2022-05-03 | 2 | -0/+76 |
* | .ua crawling follow-up stats | Bryan Newbold | 2022-04-26 | 1 | -2/+2 |
* | .ua ingest notes | Bryan Newbold | 2022-04-04 | 1 | -0/+29 |
* | various ingest/task notes | Bryan Newbold | 2022-03-22 | 1 | -4/+4 |
* | partial notes on .ua urgent crawling | Bryan Newbold | 2022-03-11 | 1 | -0/+196 |
* | enqueue PLATFORM PDFs for crawl | Bryan Newbold | 2022-01-07 | 1 | -0/+23 |
* | document progress on re-GROBID-ing | Bryan Newbold | 2022-01-05 | 1 | -0/+89 |
* | notes on re-GROBID-ing (and re-extracting) some filestrawler | Bryan Newbold | 2021-12-09 | 1 | -0/+289 |
* | wrap up crossref refs backfill notes | Bryan Newbold | 2021-11-10 | 1 | -0/+47 |
* | update crossref/grobid refs generation notes | Bryan Newbold | 2021-11-04 | 1 | -4/+96 |
* | grobid refs backfill progress | Bryan Newbold | 2021-11-04 | 1 | -1/+43 |
* | start notes on crossref refs backfill | Bryan Newbold | 2021-11-04 | 1 | -0/+54 |
* | old (2020) notes on pdfextract cleanup | Bryan Newbold | 2021-10-04 | 1 | -0/+74 |
* | notes on dumping PDF URL lists for partners | Bryan Newbold | 2021-10-04 | 1 | -0/+66 |
* | notes on file_meta task (from august) | Bryan Newbold | 2020-10-01 | 1 | -0/+66 |
* | follow-up notes on processing 'holes' | Bryan Newbold | 2020-09-02 | 1 | -0/+19 |
* | grobid+pdftext missing catch-up commands | Bryan Newbold | 2020-08-05 | 1 | -0/+101 |
* | commit old notes on a one-off CDX table cleanup | Bryan Newbold | 2020-06-25 | 1 | -0/+34 |
* | commit old (2020-02) pdftrio commands | Bryan Newbold | 2020-06-25 | 1 | -0/+162 |
* | update (and move) ingest notes | Bryan Newbold | 2020-03-03 | 3 | -294/+0 |
* | ingest backfill notes | Bryan Newbold | 2020-02-24 | 3 | -0/+150 |
* | add notes on recent ingest and backfill tasks | Bryan Newbold | 2020-02-05 | 3 | -0/+221 |