aboutsummaryrefslogtreecommitdiffstats
path: root/notes
Commit message (Expand)AuthorAgeFilesLines
...
* add implementation notes about HTML ingestBryan Newbold2020-11-101-0/+248
* fuzzy matching notesBryan Newbold2020-11-101-0/+148
* unpaywall oct 2020 crawl notesBryan Newbold2020-11-021-45/+82
* more notes on unpaywall ingest from last weekBryan Newbold2020-10-271-0/+73
* notes on 2020-09 re-ingest passesBryan Newbold2020-10-171-0/+197
* OA DOIs: partial notesBryan Newbold2020-10-171-0/+218
* notes/status on daily ingestBryan Newbold2020-10-171-0/+193
* start 2020-10 ingest notesBryan Newbold2020-10-111-0/+42
* update unpaywall 2020-04 notesBryan Newbold2020-10-111-0/+32
* OAI-PMH ingest progress timestampsBryan Newbold2020-10-111-0/+13
* notes on file_meta task (from august)Bryan Newbold2020-10-011-0/+66
* OAI-PMH ingest notesBryan Newbold2020-09-031-0/+232
* daily ingest notesBryan Newbold2020-09-021-0/+202
* follow-up notes on processing 'holes'Bryan Newbold2020-09-021-0/+19
* unpaywall ingest follow-upBryan Newbold2020-09-021-0/+115
* grobid+pdftext missing catch-up commandsBryan Newbold2020-08-051-0/+101
* MAG ingest follow-up notesBryan Newbold2020-08-051-0/+194
* MAG 2020-07 ingest notesBryan Newbold2020-07-081-0/+159
* 2020-05_pubmed ingest notes (short)Bryan Newbold2020-06-251-0/+10
* commit old notes on a one-off CDX table cleanupBryan Newbold2020-06-251-0/+34
* commit old (2020-02) pdftrio commandsBryan Newbold2020-06-251-0/+162
* ingest: OAI-PMH count tableBryan Newbold2020-05-281-0/+24
* ingest notesBryan Newbold2020-05-262-6/+76
* potential future backfill ingestsBryan Newbold2020-05-261-0/+52
* ingests: normalize file names; commit updatesBryan Newbold2020-05-2610-63/+279
* summarize datacite and MAG 2020 crawlsBryan Newbold2020-05-052-0/+200
* update MAG crawl notesBryan Newbold2020-04-281-0/+71
* COVID-19 chinese paper ingestBryan Newbold2020-04-151-0/+73
* 2020-04 unpaywall ingest (in progress)Bryan Newbold2020-04-151-0/+63
* 2020-04 datacite ingest (in progress)Bryan Newbold2020-04-151-0/+18
* partial notes on S2 crawl ingestBryan Newbold2020-04-151-0/+35
* MAG import notesBryan Newbold2020-04-131-0/+13
* MAG 2020-03-04 ingest notes to dateBryan Newbold2020-04-061-0/+395
* unpaywall ingest notes updateBryan Newbold2020-03-301-0/+138
* unpaywall large ingest notesBryan Newbold2020-03-171-0/+10
* more unpaywall ingest notesBryan Newbold2020-03-051-0/+416
* update (and move) ingest notesBryan Newbold2020-03-036-0/+480
* ingest backfill notesBryan Newbold2020-02-243-0/+150
* jan 2020 bulk ingest notesBryan Newbold2020-02-121-0/+26
* add notes on recent ingest and backfill tasksBryan Newbold2020-02-053-0/+221
* hadoop job log rename and updateBryan Newbold2019-12-271-0/+25
* update job log with pig runsBryan Newbold2019-12-261-0/+10
* updated re-GROBID job log entryBryan Newbold2019-11-151-0/+31
* ingest/backfill notesBryan Newbold2019-11-133-0/+47
* notes about running 'regrobid' batches manually (not kafka)Bryan Newbold2019-11-131-0/+41
* commit old notes about munging GROBID outputBryan Newbold2019-11-131-0/+70
* old groupworks job logBryan Newbold2019-09-201-0/+8
* petabox journal files ingest updatesBryan Newbold2019-06-201-0/+25
* clearer CDX munge notesBryan Newbold2019-05-091-1/+1
* give sort way more RAM by defaultBryan Newbold2019-02-013-6/+6