aboutsummaryrefslogtreecommitdiffstats
path: root/notes
Commit message (Collapse)AuthorAgeFilesLines
* OAI-PMH ingest progress timestampsBryan Newbold2020-10-111-0/+13
|
* notes on file_meta task (from august)Bryan Newbold2020-10-011-0/+66
|
* OAI-PMH ingest notesBryan Newbold2020-09-031-0/+232
|
* daily ingest notesBryan Newbold2020-09-021-0/+202
|
* follow-up notes on processing 'holes'Bryan Newbold2020-09-021-0/+19
|
* unpaywall ingest follow-upBryan Newbold2020-09-021-0/+115
|
* grobid+pdftext missing catch-up commandsBryan Newbold2020-08-051-0/+101
|
* MAG ingest follow-up notesBryan Newbold2020-08-051-0/+194
|
* MAG 2020-07 ingest notesBryan Newbold2020-07-081-0/+159
|
* 2020-05_pubmed ingest notes (short)Bryan Newbold2020-06-251-0/+10
|
* commit old notes on a one-off CDX table cleanupBryan Newbold2020-06-251-0/+34
|
* commit old (2020-02) pdftrio commandsBryan Newbold2020-06-251-0/+162
|
* ingest: OAI-PMH count tableBryan Newbold2020-05-281-0/+24
|
* ingest notesBryan Newbold2020-05-262-6/+76
|
* potential future backfill ingestsBryan Newbold2020-05-261-0/+52
|
* ingests: normalize file names; commit updatesBryan Newbold2020-05-2610-63/+279
|
* summarize datacite and MAG 2020 crawlsBryan Newbold2020-05-052-0/+200
|
* update MAG crawl notesBryan Newbold2020-04-281-0/+71
|
* COVID-19 chinese paper ingestBryan Newbold2020-04-151-0/+73
|
* 2020-04 unpaywall ingest (in progress)Bryan Newbold2020-04-151-0/+63
|
* 2020-04 datacite ingest (in progress)Bryan Newbold2020-04-151-0/+18
|
* partial notes on S2 crawl ingestBryan Newbold2020-04-151-0/+35
|
* MAG import notesBryan Newbold2020-04-131-0/+13
|
* MAG 2020-03-04 ingest notes to dateBryan Newbold2020-04-061-0/+395
|
* unpaywall ingest notes updateBryan Newbold2020-03-301-0/+138
|
* unpaywall large ingest notesBryan Newbold2020-03-171-0/+10
|
* more unpaywall ingest notesBryan Newbold2020-03-051-0/+416
|
* update (and move) ingest notesBryan Newbold2020-03-036-0/+480
|
* ingest backfill notesBryan Newbold2020-02-243-0/+150
|
* jan 2020 bulk ingest notesBryan Newbold2020-02-121-0/+26
|
* add notes on recent ingest and backfill tasksBryan Newbold2020-02-053-0/+221
|
* hadoop job log rename and updateBryan Newbold2019-12-271-0/+25
|
* update job log with pig runsBryan Newbold2019-12-261-0/+10
|
* updated re-GROBID job log entryBryan Newbold2019-11-151-0/+31
|
* ingest/backfill notesBryan Newbold2019-11-133-0/+47
|
* notes about running 'regrobid' batches manually (not kafka)Bryan Newbold2019-11-131-0/+41
|
* commit old notes about munging GROBID outputBryan Newbold2019-11-131-0/+70
|
* old groupworks job logBryan Newbold2019-09-201-0/+8
|
* petabox journal files ingest updatesBryan Newbold2019-06-201-0/+25
|
* clearer CDX munge notesBryan Newbold2019-05-091-1/+1
|
* give sort way more RAM by defaultBryan Newbold2019-02-013-6/+6
|
* match_filter_enrich notesBryan Newbold2019-01-031-0/+12
|
* notes on file-level metadata dumpBryan Newbold2018-12-191-0/+31
|
* update notesBryan Newbold2018-12-101-1/+14
|
* match_filter_enrich: fix typoBryan Newbold2018-09-221-1/+1
|
* match and enrich notes+scriptBryan Newbold2018-09-141-0/+19
|
* crude job stats/metrics in a text fileBryan Newbold2018-08-271-0/+95
|
* update TODOBryan Newbold2018-08-241-0/+10
|
* commit notes from my laptopBryan Newbold2018-08-246-0/+256