aboutsummaryrefslogtreecommitdiffstats
path: root/notes
Commit message (Expand)AuthorAgeFilesLines
* old notes on possible places to ingest fromBryan Newbold2022-12-231-0/+15
* old notes on domains to ingest fromBryan Newbold2022-12-231-0/+294
* notes: old examplesBryan Newbold2022-12-234-0/+307
* old notes on dryad datasetsBryan Newbold2022-12-231-0/+17
* 2022 OAI-PMH crawl notes updateBryan Newbold2022-11-231-0/+48
* notes: manually request cleanupsBryan Newbold2022-11-211-0/+132
* OAI-PMH updatesBryan Newbold2022-10-073-2/+391
* summer 2022 ingest notesBryan Newbold2022-09-063-0/+389
* misc ingest fixesBryan Newbold2022-07-211-0/+831
* unpaywall crawl wrap-up notesBryan Newbold2022-07-141-2/+145
* ingest: targeted 2022-04 notesBryan Newbold2022-07-071-1/+3
* finished re-GROBID-ingBryan Newbold2022-05-031-5/+7
* PDF URL lists updateBryan Newbold2022-05-032-0/+76
* more dataset crawl notesBryan Newbold2022-04-261-0/+53
* .ua crawling follow-up statsBryan Newbold2022-04-261-2/+2
* start notes on unpaywall and targeted/patch crawlsBryan Newbold2022-04-202-0/+277
* .ua ingest notesBryan Newbold2022-04-041-0/+29
* various ingest/task notesBryan Newbold2022-03-224-5/+97
* DOAJ ingest/crawl notesBryan Newbold2022-03-111-0/+266
* partial notes on .ua urgent crawlingBryan Newbold2022-03-111-0/+196
* 2022 patch crawl bulk ingest notesBryan Newbold2022-03-021-0/+106
* update old OAI-PMH patch crawl notesBryan Newbold2022-02-281-1/+36
* more patch crawlingBryan Newbold2022-02-082-9/+209
* OAI-PMH patch crawl more updatesBryan Newbold2022-02-081-2/+71
* ingest notes: various in-progress projectsBryan Newbold2022-01-274-3/+800
* enqueue PLATFORM PDFs for crawlBryan Newbold2022-01-071-0/+23
* document progress on re-GROBID-ingBryan Newbold2022-01-051-0/+89
* notes on re-GROBID-ing (and re-extracting) some filestrawlerBryan Newbold2021-12-091-0/+289
* commit old patch crawl notesBryan Newbold2021-12-011-0/+488
* wrap up crossref refs backfill notesBryan Newbold2021-11-101-0/+47
* update crossref/grobid refs generation notesBryan Newbold2021-11-041-4/+96
* grobid refs backfill progressBryan Newbold2021-11-041-1/+43
* start notes on crossref refs backfillBryan Newbold2021-11-041-0/+54
* old (2020) notes on pdfextract cleanupBryan Newbold2021-10-041-0/+74
* notes on dumping PDF URL lists for partnersBryan Newbold2021-10-041-0/+66
* daily OA crawl improvements/notesBryan Newbold2021-09-081-0/+1021
* OAI-PMH patch and ingest improvement notesBryan Newbold2021-09-032-204/+1578
* commit old patch crawl notes (dec 2020)Bryan Newbold2021-09-031-0/+1
* commit old arxiv ingest notesBryan Newbold2021-09-031-0/+12
* commit old patch notes (will rework)Bryan Newbold2021-09-031-0/+110
* MAG post-crawl stats (5m+ new PDFs crawled successfully)Bryan Newbold2021-09-021-0/+124
* MAG and OAI-PMH crawl/processing notesBryan Newbold2021-08-132-0/+480
* 2021-07 unpaywall crawl wrap-up notesBryan Newbold2021-07-301-12/+108
* unpaywall 2021-07 crawl partial notesBryan Newbold2021-07-141-0/+224
* notes on large-domain ingest tweaksBryan Newbold2021-05-271-0/+480
* 2021-04 unpaywall crawl notesBryan Newbold2021-05-271-0/+368
* late-2020 OA DOI crawl ingest notesBryan Newbold2021-01-041-3/+46
* DOAJ crawl ingest statsBryan Newbold2020-12-311-0/+295
* progress notes on OA DOI ingest (still running)Bryan Newbold2020-12-281-11/+102
* HTML ingest deployment notesBryan Newbold2020-12-161-1/+71