aboutsummaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
...
* cleanup tests; add one for double-processingBryan Newbold2018-04-102-20/+43
* TODO updatesBryan Newbold2018-04-103-18/+3
* wayback 404 testBryan Newbold2018-04-102-5/+49
* extraction test fixesBryan Newbold2018-04-102-27/+50
* grobid2json test fixesBryan Newbold2018-04-102-1/+3
* failing tests!Bryan Newbold2018-04-102-16/+51
* configs and README updatesBryan Newbold2018-04-074-5/+27
* nitsBryan Newbold2018-04-062-1/+2
* bug fixesBryan Newbold2018-04-061-7/+14
* updates to runningBryan Newbold2018-04-061-5/+14
* disable pig tests for nowBryan Newbold2018-04-062-7/+10
* try pig env againBryan Newbold2018-04-062-2/+4
* use IA mirror for pig downloadBryan Newbold2018-04-061-1/+2
* lint fixesBryan Newbold2018-04-066-19/+11
* fetch deps in pig scriptBryan Newbold2018-04-061-0/+1
* show coverageBryan Newbold2018-04-061-1/+1
* renamed do_teiBryan Newbold2018-04-061-3/+3
* switch to newer test imageBryan Newbold2018-04-061-1/+1
* temporarily skip pylint on extractionBryan Newbold2018-04-061-0/+3
* add pylint to CIBryan Newbold2018-04-065-41/+123
* iterate gitlab-ci.ymlBryan Newbold2018-04-061-3/+5
* add test for grobid2jsonBryan Newbold2018-04-061-0/+14
* coverage defaultsBryan Newbold2018-04-061-0/+3
* gitlab test scriptBryan Newbold2018-04-062-2/+20
* small grobid2json testBryan Newbold2018-04-064-2/+164
* make happybase mock injection slightly less horribleBryan Newbold2018-04-054-36/+31
* progress on extractorBryan Newbold2018-04-053-56/+93
* improve test coverageBryan Newbold2018-04-056-6/+39
* test coverage infoBryan Newbold2018-04-054-7/+67
* README/TODO updatesBryan Newbold2018-04-043-9/+20
* refactor out some common codeBryan Newbold2018-04-045-184/+133
* extraction -> mapreduceBryan Newbold2018-04-0414-0/+0
* merge backfill into extraction directoryBryan Newbold2018-04-0411-653/+27
* pep8Bryan Newbold2018-04-042-3/+3
* testing stuff as dev depsBryan Newbold2018-04-042-73/+109
* more testing depsBryan Newbold2018-04-042-8/+133
* trivial whitespaceBryan Newbold2018-04-042-1/+2
* more TODOBryan Newbold2018-04-042-0/+17
* more WIP on extractorBryan Newbold2018-04-045-52/+427
* add example XML output (open access)Bryan Newbold2018-04-031-0/+2004
* WIP on extractor-with-mrjobBryan Newbold2018-04-034-0/+954
* fix very important typoBryan Newbold2018-04-031-1/+1
* shift docs around a bitBryan Newbold2018-04-032-9/+12
* actually running hadoop job on clusterBryan Newbold2018-04-032-0/+18
* fix silly bugs in backfiller (need more tests)Bryan Newbold2018-04-031-3/+4
* add setuptools (can probably remove)Bryan Newbold2018-04-032-7/+8
* heritrix expects ints, not strings, for numbersBryan Newbold2018-04-021-7/+7
* backfill: sha1 prefix, cluster exampleBryan Newbold2018-03-303-8/+19
* clean up backfill code/testsBryan Newbold2018-03-302-24/+42
* refactor backfill for mrjobBryan Newbold2018-03-304-64/+145