Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | update old OAI-PMH patch crawl notes | Bryan Newbold | 2022-02-28 | 1 | -1/+36 |
* | more patch crawling | Bryan Newbold | 2022-02-08 | 2 | -9/+209 |
* | OAI-PMH patch crawl more updates | Bryan Newbold | 2022-02-08 | 1 | -2/+71 |
* | ingest notes: various in-progress projects | Bryan Newbold | 2022-01-27 | 4 | -3/+800 |
* | enqueue PLATFORM PDFs for crawl | Bryan Newbold | 2022-01-07 | 1 | -0/+23 |
* | document progress on re-GROBID-ing | Bryan Newbold | 2022-01-05 | 1 | -0/+89 |
* | notes on re-GROBID-ing (and re-extracting) some filestrawler | Bryan Newbold | 2021-12-09 | 1 | -0/+289 |
* | commit old patch crawl notes | Bryan Newbold | 2021-12-01 | 1 | -0/+488 |
* | wrap up crossref refs backfill notes | Bryan Newbold | 2021-11-10 | 1 | -0/+47 |
* | update crossref/grobid refs generation notes | Bryan Newbold | 2021-11-04 | 1 | -4/+96 |
* | grobid refs backfill progress | Bryan Newbold | 2021-11-04 | 1 | -1/+43 |
* | start notes on crossref refs backfill | Bryan Newbold | 2021-11-04 | 1 | -0/+54 |
* | old (2020) notes on pdfextract cleanup | Bryan Newbold | 2021-10-04 | 1 | -0/+74 |
* | notes on dumping PDF URL lists for partners | Bryan Newbold | 2021-10-04 | 1 | -0/+66 |
* | daily OA crawl improvements/notes | Bryan Newbold | 2021-09-08 | 1 | -0/+1021 |
* | OAI-PMH patch and ingest improvement notes | Bryan Newbold | 2021-09-03 | 2 | -204/+1578 |
* | commit old patch crawl notes (dec 2020) | Bryan Newbold | 2021-09-03 | 1 | -0/+1 |
* | commit old arxiv ingest notes | Bryan Newbold | 2021-09-03 | 1 | -0/+12 |
* | commit old patch notes (will rework) | Bryan Newbold | 2021-09-03 | 1 | -0/+110 |
* | MAG post-crawl stats (5m+ new PDFs crawled successfully) | Bryan Newbold | 2021-09-02 | 1 | -0/+124 |
* | MAG and OAI-PMH crawl/processing notes | Bryan Newbold | 2021-08-13 | 2 | -0/+480 |
* | 2021-07 unpaywall crawl wrap-up notes | Bryan Newbold | 2021-07-30 | 1 | -12/+108 |
* | unpaywall 2021-07 crawl partial notes | Bryan Newbold | 2021-07-14 | 1 | -0/+224 |
* | notes on large-domain ingest tweaks | Bryan Newbold | 2021-05-27 | 1 | -0/+480 |
* | 2021-04 unpaywall crawl notes | Bryan Newbold | 2021-05-27 | 1 | -0/+368 |
* | late-2020 OA DOI crawl ingest notes | Bryan Newbold | 2021-01-04 | 1 | -3/+46 |
* | DOAJ crawl ingest stats | Bryan Newbold | 2020-12-31 | 1 | -0/+295 |
* | progress notes on OA DOI ingest (still running) | Bryan Newbold | 2020-12-28 | 1 | -11/+102 |
* | HTML ingest deployment notes | Bryan Newbold | 2020-12-16 | 1 | -1/+71 |
* | unpaywall crawl/ingest update (from Oct 2020) | Bryan Newbold | 2020-12-08 | 1 | -0/+134 |
* | commit sept 2020 scielo ingest notes | Bryan Newbold | 2020-12-08 | 1 | -0/+21 |
* | add implementation notes about HTML ingest | Bryan Newbold | 2020-11-10 | 1 | -0/+248 |
* | fuzzy matching notes | Bryan Newbold | 2020-11-10 | 1 | -0/+148 |
* | unpaywall oct 2020 crawl notes | Bryan Newbold | 2020-11-02 | 1 | -45/+82 |
* | more notes on unpaywall ingest from last week | Bryan Newbold | 2020-10-27 | 1 | -0/+73 |
* | notes on 2020-09 re-ingest passes | Bryan Newbold | 2020-10-17 | 1 | -0/+197 |
* | OA DOIs: partial notes | Bryan Newbold | 2020-10-17 | 1 | -0/+218 |
* | notes/status on daily ingest | Bryan Newbold | 2020-10-17 | 1 | -0/+193 |
* | start 2020-10 ingest notes | Bryan Newbold | 2020-10-11 | 1 | -0/+42 |
* | update unpaywall 2020-04 notes | Bryan Newbold | 2020-10-11 | 1 | -0/+32 |
* | OAI-PMH ingest progress timestamps | Bryan Newbold | 2020-10-11 | 1 | -0/+13 |
* | notes on file_meta task (from august) | Bryan Newbold | 2020-10-01 | 1 | -0/+66 |
* | OAI-PMH ingest notes | Bryan Newbold | 2020-09-03 | 1 | -0/+232 |
* | daily ingest notes | Bryan Newbold | 2020-09-02 | 1 | -0/+202 |
* | follow-up notes on processing 'holes' | Bryan Newbold | 2020-09-02 | 1 | -0/+19 |
* | unpaywall ingest follow-up | Bryan Newbold | 2020-09-02 | 1 | -0/+115 |
* | grobid+pdftext missing catch-up commands | Bryan Newbold | 2020-08-05 | 1 | -0/+101 |
* | MAG ingest follow-up notes | Bryan Newbold | 2020-08-05 | 1 | -0/+194 |
* | MAG 2020-07 ingest notes | Bryan Newbold | 2020-07-08 | 1 | -0/+159 |
* | 2020-05_pubmed ingest notes (short) | Bryan Newbold | 2020-06-25 | 1 | -0/+10 |