Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | notes: manually request cleanups | Bryan Newbold | 2022-11-21 | 1 | -0/+132 |
| | |||||
* | OAI-PMH updates | Bryan Newbold | 2022-10-07 | 3 | -2/+391 |
| | |||||
* | summer 2022 ingest notes | Bryan Newbold | 2022-09-06 | 3 | -0/+389 |
| | |||||
* | misc ingest fixes | Bryan Newbold | 2022-07-21 | 1 | -0/+831 |
| | |||||
* | unpaywall crawl wrap-up notes | Bryan Newbold | 2022-07-14 | 1 | -2/+145 |
| | |||||
* | ingest: targeted 2022-04 notes | Bryan Newbold | 2022-07-07 | 1 | -1/+3 |
| | |||||
* | finished re-GROBID-ing | Bryan Newbold | 2022-05-03 | 1 | -5/+7 |
| | |||||
* | PDF URL lists update | Bryan Newbold | 2022-05-03 | 2 | -0/+76 |
| | |||||
* | more dataset crawl notes | Bryan Newbold | 2022-04-26 | 1 | -0/+53 |
| | |||||
* | .ua crawling follow-up stats | Bryan Newbold | 2022-04-26 | 1 | -2/+2 |
| | |||||
* | start notes on unpaywall and targeted/patch crawls | Bryan Newbold | 2022-04-20 | 2 | -0/+277 |
| | |||||
* | .ua ingest notes | Bryan Newbold | 2022-04-04 | 1 | -0/+29 |
| | |||||
* | various ingest/task notes | Bryan Newbold | 2022-03-22 | 4 | -5/+97 |
| | |||||
* | DOAJ ingest/crawl notes | Bryan Newbold | 2022-03-11 | 1 | -0/+266 |
| | |||||
* | partial notes on .ua urgent crawling | Bryan Newbold | 2022-03-11 | 1 | -0/+196 |
| | |||||
* | 2022 patch crawl bulk ingest notes | Bryan Newbold | 2022-03-02 | 1 | -0/+106 |
| | |||||
* | update old OAI-PMH patch crawl notes | Bryan Newbold | 2022-02-28 | 1 | -1/+36 |
| | |||||
* | more patch crawling | Bryan Newbold | 2022-02-08 | 2 | -9/+209 |
| | |||||
* | OAI-PMH patch crawl more updates | Bryan Newbold | 2022-02-08 | 1 | -2/+71 |
| | |||||
* | ingest notes: various in-progress projects | Bryan Newbold | 2022-01-27 | 4 | -3/+800 |
| | |||||
* | enqueue PLATFORM PDFs for crawl | Bryan Newbold | 2022-01-07 | 1 | -0/+23 |
| | |||||
* | document progress on re-GROBID-ing | Bryan Newbold | 2022-01-05 | 1 | -0/+89 |
| | |||||
* | notes on re-GROBID-ing (and re-extracting) some filestrawler | Bryan Newbold | 2021-12-09 | 1 | -0/+289 |
| | |||||
* | commit old patch crawl notes | Bryan Newbold | 2021-12-01 | 1 | -0/+488 |
| | |||||
* | wrap up crossref refs backfill notes | Bryan Newbold | 2021-11-10 | 1 | -0/+47 |
| | |||||
* | update crossref/grobid refs generation notes | Bryan Newbold | 2021-11-04 | 1 | -4/+96 |
| | |||||
* | grobid refs backfill progress | Bryan Newbold | 2021-11-04 | 1 | -1/+43 |
| | |||||
* | start notes on crossref refs backfill | Bryan Newbold | 2021-11-04 | 1 | -0/+54 |
| | |||||
* | old (2020) notes on pdfextract cleanup | Bryan Newbold | 2021-10-04 | 1 | -0/+74 |
| | |||||
* | notes on dumping PDF URL lists for partners | Bryan Newbold | 2021-10-04 | 1 | -0/+66 |
| | |||||
* | daily OA crawl improvements/notes | Bryan Newbold | 2021-09-08 | 1 | -0/+1021 |
| | |||||
* | OAI-PMH patch and ingest improvement notes | Bryan Newbold | 2021-09-03 | 2 | -204/+1578 |
| | |||||
* | commit old patch crawl notes (dec 2020) | Bryan Newbold | 2021-09-03 | 1 | -0/+1 |
| | |||||
* | commit old arxiv ingest notes | Bryan Newbold | 2021-09-03 | 1 | -0/+12 |
| | |||||
* | commit old patch notes (will rework) | Bryan Newbold | 2021-09-03 | 1 | -0/+110 |
| | |||||
* | MAG post-crawl stats (5m+ new PDFs crawled successfully) | Bryan Newbold | 2021-09-02 | 1 | -0/+124 |
| | |||||
* | MAG and OAI-PMH crawl/processing notes | Bryan Newbold | 2021-08-13 | 2 | -0/+480 |
| | |||||
* | 2021-07 unpaywall crawl wrap-up notes | Bryan Newbold | 2021-07-30 | 1 | -12/+108 |
| | |||||
* | unpaywall 2021-07 crawl partial notes | Bryan Newbold | 2021-07-14 | 1 | -0/+224 |
| | |||||
* | notes on large-domain ingest tweaks | Bryan Newbold | 2021-05-27 | 1 | -0/+480 |
| | |||||
* | 2021-04 unpaywall crawl notes | Bryan Newbold | 2021-05-27 | 1 | -0/+368 |
| | |||||
* | late-2020 OA DOI crawl ingest notes | Bryan Newbold | 2021-01-04 | 1 | -3/+46 |
| | |||||
* | DOAJ crawl ingest stats | Bryan Newbold | 2020-12-31 | 1 | -0/+295 |
| | |||||
* | progress notes on OA DOI ingest (still running) | Bryan Newbold | 2020-12-28 | 1 | -11/+102 |
| | |||||
* | HTML ingest deployment notes | Bryan Newbold | 2020-12-16 | 1 | -1/+71 |
| | |||||
* | unpaywall crawl/ingest update (from Oct 2020) | Bryan Newbold | 2020-12-08 | 1 | -0/+134 |
| | |||||
* | commit sept 2020 scielo ingest notes | Bryan Newbold | 2020-12-08 | 1 | -0/+21 |
| | |||||
* | add implementation notes about HTML ingest | Bryan Newbold | 2020-11-10 | 1 | -0/+248 |
| | |||||
* | fuzzy matching notes | Bryan Newbold | 2020-11-10 | 1 | -0/+148 |
| | |||||
* | unpaywall oct 2020 crawl notes | Bryan Newbold | 2020-11-02 | 1 | -45/+82 |
| |