aboutsummaryrefslogtreecommitdiffstats
path: root/notes
Commit message (Expand)AuthorAgeFilesLines
* notes on re-GROBID-ing (and re-extracting) some filestrawlerBryan Newbold2021-12-091-0/+289
* commit old patch crawl notesBryan Newbold2021-12-011-0/+488
* wrap up crossref refs backfill notesBryan Newbold2021-11-101-0/+47
* update crossref/grobid refs generation notesBryan Newbold2021-11-041-4/+96
* grobid refs backfill progressBryan Newbold2021-11-041-1/+43
* start notes on crossref refs backfillBryan Newbold2021-11-041-0/+54
* old (2020) notes on pdfextract cleanupBryan Newbold2021-10-041-0/+74
* notes on dumping PDF URL lists for partnersBryan Newbold2021-10-041-0/+66
* daily OA crawl improvements/notesBryan Newbold2021-09-081-0/+1021
* OAI-PMH patch and ingest improvement notesBryan Newbold2021-09-032-204/+1578
* commit old patch crawl notes (dec 2020)Bryan Newbold2021-09-031-0/+1
* commit old arxiv ingest notesBryan Newbold2021-09-031-0/+12
* commit old patch notes (will rework)Bryan Newbold2021-09-031-0/+110
* MAG post-crawl stats (5m+ new PDFs crawled successfully)Bryan Newbold2021-09-021-0/+124
* MAG and OAI-PMH crawl/processing notesBryan Newbold2021-08-132-0/+480
* 2021-07 unpaywall crawl wrap-up notesBryan Newbold2021-07-301-12/+108
* unpaywall 2021-07 crawl partial notesBryan Newbold2021-07-141-0/+224
* notes on large-domain ingest tweaksBryan Newbold2021-05-271-0/+480
* 2021-04 unpaywall crawl notesBryan Newbold2021-05-271-0/+368
* late-2020 OA DOI crawl ingest notesBryan Newbold2021-01-041-3/+46
* DOAJ crawl ingest statsBryan Newbold2020-12-311-0/+295
* progress notes on OA DOI ingest (still running)Bryan Newbold2020-12-281-11/+102
* HTML ingest deployment notesBryan Newbold2020-12-161-1/+71
* unpaywall crawl/ingest update (from Oct 2020)Bryan Newbold2020-12-081-0/+134
* commit sept 2020 scielo ingest notesBryan Newbold2020-12-081-0/+21
* add implementation notes about HTML ingestBryan Newbold2020-11-101-0/+248
* fuzzy matching notesBryan Newbold2020-11-101-0/+148
* unpaywall oct 2020 crawl notesBryan Newbold2020-11-021-45/+82
* more notes on unpaywall ingest from last weekBryan Newbold2020-10-271-0/+73
* notes on 2020-09 re-ingest passesBryan Newbold2020-10-171-0/+197
* OA DOIs: partial notesBryan Newbold2020-10-171-0/+218
* notes/status on daily ingestBryan Newbold2020-10-171-0/+193
* start 2020-10 ingest notesBryan Newbold2020-10-111-0/+42
* update unpaywall 2020-04 notesBryan Newbold2020-10-111-0/+32
* OAI-PMH ingest progress timestampsBryan Newbold2020-10-111-0/+13
* notes on file_meta task (from august)Bryan Newbold2020-10-011-0/+66
* OAI-PMH ingest notesBryan Newbold2020-09-031-0/+232
* daily ingest notesBryan Newbold2020-09-021-0/+202
* follow-up notes on processing 'holes'Bryan Newbold2020-09-021-0/+19
* unpaywall ingest follow-upBryan Newbold2020-09-021-0/+115
* grobid+pdftext missing catch-up commandsBryan Newbold2020-08-051-0/+101
* MAG ingest follow-up notesBryan Newbold2020-08-051-0/+194
* MAG 2020-07 ingest notesBryan Newbold2020-07-081-0/+159
* 2020-05_pubmed ingest notes (short)Bryan Newbold2020-06-251-0/+10
* commit old notes on a one-off CDX table cleanupBryan Newbold2020-06-251-0/+34
* commit old (2020-02) pdftrio commandsBryan Newbold2020-06-251-0/+162
* ingest: OAI-PMH count tableBryan Newbold2020-05-281-0/+24
* ingest notesBryan Newbold2020-05-262-6/+76
* potential future backfill ingestsBryan Newbold2020-05-261-0/+52
* ingests: normalize file names; commit updatesBryan Newbold2020-05-2610-63/+279