Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | finished re-GROBID-ing | Bryan Newbold | 2022-05-03 | 1 | -5/+7 |
* | PDF URL lists update | Bryan Newbold | 2022-05-03 | 2 | -0/+76 |
* | more dataset crawl notes | Bryan Newbold | 2022-04-26 | 1 | -0/+53 |
* | .ua crawling follow-up stats | Bryan Newbold | 2022-04-26 | 1 | -2/+2 |
* | start notes on unpaywall and targeted/patch crawls | Bryan Newbold | 2022-04-20 | 2 | -0/+277 |
* | .ua ingest notes | Bryan Newbold | 2022-04-04 | 1 | -0/+29 |
* | various ingest/task notes | Bryan Newbold | 2022-03-22 | 4 | -5/+97 |
* | DOAJ ingest/crawl notes | Bryan Newbold | 2022-03-11 | 1 | -0/+266 |
* | partial notes on .ua urgent crawling | Bryan Newbold | 2022-03-11 | 1 | -0/+196 |
* | 2022 patch crawl bulk ingest notes | Bryan Newbold | 2022-03-02 | 1 | -0/+106 |
* | update old OAI-PMH patch crawl notes | Bryan Newbold | 2022-02-28 | 1 | -1/+36 |
* | more patch crawling | Bryan Newbold | 2022-02-08 | 2 | -9/+209 |
* | OAI-PMH patch crawl more updates | Bryan Newbold | 2022-02-08 | 1 | -2/+71 |
* | ingest notes: various in-progress projects | Bryan Newbold | 2022-01-27 | 4 | -3/+800 |
* | enqueue PLATFORM PDFs for crawl | Bryan Newbold | 2022-01-07 | 1 | -0/+23 |
* | document progress on re-GROBID-ing | Bryan Newbold | 2022-01-05 | 1 | -0/+89 |
* | notes on re-GROBID-ing (and re-extracting) some filestrawler | Bryan Newbold | 2021-12-09 | 1 | -0/+289 |
* | commit old patch crawl notes | Bryan Newbold | 2021-12-01 | 1 | -0/+488 |
* | wrap up crossref refs backfill notes | Bryan Newbold | 2021-11-10 | 1 | -0/+47 |
* | update crossref/grobid refs generation notes | Bryan Newbold | 2021-11-04 | 1 | -4/+96 |
* | grobid refs backfill progress | Bryan Newbold | 2021-11-04 | 1 | -1/+43 |
* | start notes on crossref refs backfill | Bryan Newbold | 2021-11-04 | 1 | -0/+54 |
* | old (2020) notes on pdfextract cleanup | Bryan Newbold | 2021-10-04 | 1 | -0/+74 |
* | notes on dumping PDF URL lists for partners | Bryan Newbold | 2021-10-04 | 1 | -0/+66 |
* | daily OA crawl improvements/notes | Bryan Newbold | 2021-09-08 | 1 | -0/+1021 |
* | OAI-PMH patch and ingest improvement notes | Bryan Newbold | 2021-09-03 | 2 | -204/+1578 |
* | commit old patch crawl notes (dec 2020) | Bryan Newbold | 2021-09-03 | 1 | -0/+1 |
* | commit old arxiv ingest notes | Bryan Newbold | 2021-09-03 | 1 | -0/+12 |
* | commit old patch notes (will rework) | Bryan Newbold | 2021-09-03 | 1 | -0/+110 |
* | MAG post-crawl stats (5m+ new PDFs crawled successfully) | Bryan Newbold | 2021-09-02 | 1 | -0/+124 |
* | MAG and OAI-PMH crawl/processing notes | Bryan Newbold | 2021-08-13 | 2 | -0/+480 |
* | 2021-07 unpaywall crawl wrap-up notes | Bryan Newbold | 2021-07-30 | 1 | -12/+108 |
* | unpaywall 2021-07 crawl partial notes | Bryan Newbold | 2021-07-14 | 1 | -0/+224 |
* | notes on large-domain ingest tweaks | Bryan Newbold | 2021-05-27 | 1 | -0/+480 |
* | 2021-04 unpaywall crawl notes | Bryan Newbold | 2021-05-27 | 1 | -0/+368 |
* | late-2020 OA DOI crawl ingest notes | Bryan Newbold | 2021-01-04 | 1 | -3/+46 |
* | DOAJ crawl ingest stats | Bryan Newbold | 2020-12-31 | 1 | -0/+295 |
* | progress notes on OA DOI ingest (still running) | Bryan Newbold | 2020-12-28 | 1 | -11/+102 |
* | HTML ingest deployment notes | Bryan Newbold | 2020-12-16 | 1 | -1/+71 |
* | unpaywall crawl/ingest update (from Oct 2020) | Bryan Newbold | 2020-12-08 | 1 | -0/+134 |
* | commit sept 2020 scielo ingest notes | Bryan Newbold | 2020-12-08 | 1 | -0/+21 |
* | add implementation notes about HTML ingest | Bryan Newbold | 2020-11-10 | 1 | -0/+248 |
* | fuzzy matching notes | Bryan Newbold | 2020-11-10 | 1 | -0/+148 |
* | unpaywall oct 2020 crawl notes | Bryan Newbold | 2020-11-02 | 1 | -45/+82 |
* | more notes on unpaywall ingest from last week | Bryan Newbold | 2020-10-27 | 1 | -0/+73 |
* | notes on 2020-09 re-ingest passes | Bryan Newbold | 2020-10-17 | 1 | -0/+197 |
* | OA DOIs: partial notes | Bryan Newbold | 2020-10-17 | 1 | -0/+218 |
* | notes/status on daily ingest | Bryan Newbold | 2020-10-17 | 1 | -0/+193 |
* | start 2020-10 ingest notes | Bryan Newbold | 2020-10-11 | 1 | -0/+42 |
* | update unpaywall 2020-04 notes | Bryan Newbold | 2020-10-11 | 1 | -0/+32 |