| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | start notes on unpaywall and targeted/patch crawls | Bryan Newbold | 2022-04-20 | 2 | -0/+277 | 
| | | |||||
| * | .ua ingest notes | Bryan Newbold | 2022-04-04 | 1 | -0/+29 | 
| | | |||||
| * | various ingest/task notes | Bryan Newbold | 2022-03-22 | 4 | -5/+97 | 
| | | |||||
| * | DOAJ ingest/crawl notes | Bryan Newbold | 2022-03-11 | 1 | -0/+266 | 
| | | |||||
| * | partial notes on .ua urgent crawling | Bryan Newbold | 2022-03-11 | 1 | -0/+196 | 
| | | |||||
| * | 2022 patch crawl bulk ingest notes | Bryan Newbold | 2022-03-02 | 1 | -0/+106 | 
| | | |||||
| * | update old OAI-PMH patch crawl notes | Bryan Newbold | 2022-02-28 | 1 | -1/+36 | 
| | | |||||
| * | more patch crawling | Bryan Newbold | 2022-02-08 | 2 | -9/+209 | 
| | | |||||
| * | OAI-PMH patch crawl more updates | Bryan Newbold | 2022-02-08 | 1 | -2/+71 | 
| | | |||||
| * | ingest notes: various in-progress projects | Bryan Newbold | 2022-01-27 | 4 | -3/+800 | 
| | | |||||
| * | enqueue PLATFORM PDFs for crawl | Bryan Newbold | 2022-01-07 | 1 | -0/+23 | 
| | | |||||
| * | document progress on re-GROBID-ing | Bryan Newbold | 2022-01-05 | 1 | -0/+89 | 
| | | |||||
| * | notes on re-GROBID-ing (and re-extracting) some filestrawler | Bryan Newbold | 2021-12-09 | 1 | -0/+289 | 
| | | |||||
| * | commit old patch crawl notes | Bryan Newbold | 2021-12-01 | 1 | -0/+488 | 
| | | |||||
| * | wrap up crossref refs backfill notes | Bryan Newbold | 2021-11-10 | 1 | -0/+47 | 
| | | |||||
| * | update crossref/grobid refs generation notes | Bryan Newbold | 2021-11-04 | 1 | -4/+96 | 
| | | |||||
| * | grobid refs backfill progress | Bryan Newbold | 2021-11-04 | 1 | -1/+43 | 
| | | |||||
| * | start notes on crossref refs backfill | Bryan Newbold | 2021-11-04 | 1 | -0/+54 | 
| | | |||||
| * | old (2020) notes on pdfextract cleanup | Bryan Newbold | 2021-10-04 | 1 | -0/+74 | 
| | | |||||
| * | notes on dumping PDF URL lists for partners | Bryan Newbold | 2021-10-04 | 1 | -0/+66 | 
| | | |||||
| * | daily OA crawl improvements/notes | Bryan Newbold | 2021-09-08 | 1 | -0/+1021 | 
| | | |||||
| * | OAI-PMH patch and ingest improvement notes | Bryan Newbold | 2021-09-03 | 2 | -204/+1578 | 
| | | |||||
| * | commit old patch crawl notes (dec 2020) | Bryan Newbold | 2021-09-03 | 1 | -0/+1 | 
| | | |||||
| * | commit old arxiv ingest notes | Bryan Newbold | 2021-09-03 | 1 | -0/+12 | 
| | | |||||
| * | commit old patch notes (will rework) | Bryan Newbold | 2021-09-03 | 1 | -0/+110 | 
| | | |||||
| * | MAG post-crawl stats (5m+ new PDFs crawled successfully) | Bryan Newbold | 2021-09-02 | 1 | -0/+124 | 
| | | |||||
| * | MAG and OAI-PMH crawl/processing notes | Bryan Newbold | 2021-08-13 | 2 | -0/+480 | 
| | | |||||
| * | 2021-07 unpaywall crawl wrap-up notes | Bryan Newbold | 2021-07-30 | 1 | -12/+108 | 
| | | |||||
| * | unpaywall 2021-07 crawl partial notes | Bryan Newbold | 2021-07-14 | 1 | -0/+224 | 
| | | |||||
| * | notes on large-domain ingest tweaks | Bryan Newbold | 2021-05-27 | 1 | -0/+480 | 
| | | |||||
| * | 2021-04 unpaywall crawl notes | Bryan Newbold | 2021-05-27 | 1 | -0/+368 | 
| | | |||||
| * | late-2020 OA DOI crawl ingest notes | Bryan Newbold | 2021-01-04 | 1 | -3/+46 | 
| | | |||||
| * | DOAJ crawl ingest stats | Bryan Newbold | 2020-12-31 | 1 | -0/+295 | 
| | | |||||
| * | progress notes on OA DOI ingest (still running) | Bryan Newbold | 2020-12-28 | 1 | -11/+102 | 
| | | |||||
| * | HTML ingest deployment notes | Bryan Newbold | 2020-12-16 | 1 | -1/+71 | 
| | | |||||
| * | unpaywall crawl/ingest update (from Oct 2020) | Bryan Newbold | 2020-12-08 | 1 | -0/+134 | 
| | | |||||
| * | commit sept 2020 scielo ingest notes | Bryan Newbold | 2020-12-08 | 1 | -0/+21 | 
| | | |||||
| * | add implementation notes about HTML ingest | Bryan Newbold | 2020-11-10 | 1 | -0/+248 | 
| | | |||||
| * | fuzzy matching notes | Bryan Newbold | 2020-11-10 | 1 | -0/+148 | 
| | | |||||
| * | unpaywall oct 2020 crawl notes | Bryan Newbold | 2020-11-02 | 1 | -45/+82 | 
| | | |||||
| * | more notes on unpaywall ingest from last week | Bryan Newbold | 2020-10-27 | 1 | -0/+73 | 
| | | |||||
| * | notes on 2020-09 re-ingest passes | Bryan Newbold | 2020-10-17 | 1 | -0/+197 | 
| | | |||||
| * | OA DOIs: partial notes | Bryan Newbold | 2020-10-17 | 1 | -0/+218 | 
| | | |||||
| * | notes/status on daily ingest | Bryan Newbold | 2020-10-17 | 1 | -0/+193 | 
| | | |||||
| * | start 2020-10 ingest notes | Bryan Newbold | 2020-10-11 | 1 | -0/+42 | 
| | | |||||
| * | update unpaywall 2020-04 notes | Bryan Newbold | 2020-10-11 | 1 | -0/+32 | 
| | | |||||
| * | OAI-PMH ingest progress timestamps | Bryan Newbold | 2020-10-11 | 1 | -0/+13 | 
| | | |||||
| * | notes on file_meta task (from august) | Bryan Newbold | 2020-10-01 | 1 | -0/+66 | 
| | | |||||
| * | OAI-PMH ingest notes | Bryan Newbold | 2020-09-03 | 1 | -0/+232 | 
| | | |||||
| * | daily ingest notes | Bryan Newbold | 2020-09-02 | 1 | -0/+202 | 
| | | |||||
