Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | OAI-PMH patch crawl more updates | Bryan Newbold | 2022-02-08 | 1 | -2/+71 |
| | |||||
* | ingest notes: various in-progress projects | Bryan Newbold | 2022-01-27 | 4 | -3/+800 |
| | |||||
* | enqueue PLATFORM PDFs for crawl | Bryan Newbold | 2022-01-07 | 1 | -0/+23 |
| | |||||
* | document progress on re-GROBID-ing | Bryan Newbold | 2022-01-05 | 1 | -0/+89 |
| | |||||
* | notes on re-GROBID-ing (and re-extracting) some filestrawler | Bryan Newbold | 2021-12-09 | 1 | -0/+289 |
| | |||||
* | commit old patch crawl notes | Bryan Newbold | 2021-12-01 | 1 | -0/+488 |
| | |||||
* | wrap up crossref refs backfill notes | Bryan Newbold | 2021-11-10 | 1 | -0/+47 |
| | |||||
* | update crossref/grobid refs generation notes | Bryan Newbold | 2021-11-04 | 1 | -4/+96 |
| | |||||
* | grobid refs backfill progress | Bryan Newbold | 2021-11-04 | 1 | -1/+43 |
| | |||||
* | start notes on crossref refs backfill | Bryan Newbold | 2021-11-04 | 1 | -0/+54 |
| | |||||
* | old (2020) notes on pdfextract cleanup | Bryan Newbold | 2021-10-04 | 1 | -0/+74 |
| | |||||
* | notes on dumping PDF URL lists for partners | Bryan Newbold | 2021-10-04 | 1 | -0/+66 |
| | |||||
* | daily OA crawl improvements/notes | Bryan Newbold | 2021-09-08 | 1 | -0/+1021 |
| | |||||
* | OAI-PMH patch and ingest improvement notes | Bryan Newbold | 2021-09-03 | 2 | -204/+1578 |
| | |||||
* | commit old patch crawl notes (dec 2020) | Bryan Newbold | 2021-09-03 | 1 | -0/+1 |
| | |||||
* | commit old arxiv ingest notes | Bryan Newbold | 2021-09-03 | 1 | -0/+12 |
| | |||||
* | commit old patch notes (will rework) | Bryan Newbold | 2021-09-03 | 1 | -0/+110 |
| | |||||
* | MAG post-crawl stats (5m+ new PDFs crawled successfully) | Bryan Newbold | 2021-09-02 | 1 | -0/+124 |
| | |||||
* | MAG and OAI-PMH crawl/processing notes | Bryan Newbold | 2021-08-13 | 2 | -0/+480 |
| | |||||
* | 2021-07 unpaywall crawl wrap-up notes | Bryan Newbold | 2021-07-30 | 1 | -12/+108 |
| | |||||
* | unpaywall 2021-07 crawl partial notes | Bryan Newbold | 2021-07-14 | 1 | -0/+224 |
| | |||||
* | notes on large-domain ingest tweaks | Bryan Newbold | 2021-05-27 | 1 | -0/+480 |
| | |||||
* | 2021-04 unpaywall crawl notes | Bryan Newbold | 2021-05-27 | 1 | -0/+368 |
| | |||||
* | late-2020 OA DOI crawl ingest notes | Bryan Newbold | 2021-01-04 | 1 | -3/+46 |
| | |||||
* | DOAJ crawl ingest stats | Bryan Newbold | 2020-12-31 | 1 | -0/+295 |
| | |||||
* | progress notes on OA DOI ingest (still running) | Bryan Newbold | 2020-12-28 | 1 | -11/+102 |
| | |||||
* | HTML ingest deployment notes | Bryan Newbold | 2020-12-16 | 1 | -1/+71 |
| | |||||
* | unpaywall crawl/ingest update (from Oct 2020) | Bryan Newbold | 2020-12-08 | 1 | -0/+134 |
| | |||||
* | commit sept 2020 scielo ingest notes | Bryan Newbold | 2020-12-08 | 1 | -0/+21 |
| | |||||
* | add implementation notes about HTML ingest | Bryan Newbold | 2020-11-10 | 1 | -0/+248 |
| | |||||
* | fuzzy matching notes | Bryan Newbold | 2020-11-10 | 1 | -0/+148 |
| | |||||
* | unpaywall oct 2020 crawl notes | Bryan Newbold | 2020-11-02 | 1 | -45/+82 |
| | |||||
* | more notes on unpaywall ingest from last week | Bryan Newbold | 2020-10-27 | 1 | -0/+73 |
| | |||||
* | notes on 2020-09 re-ingest passes | Bryan Newbold | 2020-10-17 | 1 | -0/+197 |
| | |||||
* | OA DOIs: partial notes | Bryan Newbold | 2020-10-17 | 1 | -0/+218 |
| | |||||
* | notes/status on daily ingest | Bryan Newbold | 2020-10-17 | 1 | -0/+193 |
| | |||||
* | start 2020-10 ingest notes | Bryan Newbold | 2020-10-11 | 1 | -0/+42 |
| | |||||
* | update unpaywall 2020-04 notes | Bryan Newbold | 2020-10-11 | 1 | -0/+32 |
| | |||||
* | OAI-PMH ingest progress timestamps | Bryan Newbold | 2020-10-11 | 1 | -0/+13 |
| | |||||
* | notes on file_meta task (from august) | Bryan Newbold | 2020-10-01 | 1 | -0/+66 |
| | |||||
* | OAI-PMH ingest notes | Bryan Newbold | 2020-09-03 | 1 | -0/+232 |
| | |||||
* | daily ingest notes | Bryan Newbold | 2020-09-02 | 1 | -0/+202 |
| | |||||
* | follow-up notes on processing 'holes' | Bryan Newbold | 2020-09-02 | 1 | -0/+19 |
| | |||||
* | unpaywall ingest follow-up | Bryan Newbold | 2020-09-02 | 1 | -0/+115 |
| | |||||
* | grobid+pdftext missing catch-up commands | Bryan Newbold | 2020-08-05 | 1 | -0/+101 |
| | |||||
* | MAG ingest follow-up notes | Bryan Newbold | 2020-08-05 | 1 | -0/+194 |
| | |||||
* | MAG 2020-07 ingest notes | Bryan Newbold | 2020-07-08 | 1 | -0/+159 |
| | |||||
* | 2020-05_pubmed ingest notes (short) | Bryan Newbold | 2020-06-25 | 1 | -0/+10 |
| | |||||
* | commit old notes on a one-off CDX table cleanup | Bryan Newbold | 2020-06-25 | 1 | -0/+34 |
| | |||||
* | commit old (2020-02) pdftrio commands | Bryan Newbold | 2020-06-25 | 1 | -0/+162 |
| |