Commit message (Expand) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | cleanup tests; add one for double-processing | Bryan Newbold | 2018-04-10 | 2 | -20/+43 | |
* | TODO updates | Bryan Newbold | 2018-04-10 | 3 | -18/+3 | |
* | wayback 404 test | Bryan Newbold | 2018-04-10 | 2 | -5/+49 | |
* | extraction test fixes | Bryan Newbold | 2018-04-10 | 2 | -27/+50 | |
* | grobid2json test fixes | Bryan Newbold | 2018-04-10 | 2 | -1/+3 | |
* | failing tests! | Bryan Newbold | 2018-04-10 | 2 | -16/+51 | |
* | configs and README updates | Bryan Newbold | 2018-04-07 | 4 | -5/+27 | |
* | nits | Bryan Newbold | 2018-04-06 | 2 | -1/+2 | |
* | bug fixes | Bryan Newbold | 2018-04-06 | 1 | -7/+14 | |
* | updates to running | Bryan Newbold | 2018-04-06 | 1 | -5/+14 | |
* | disable pig tests for now | Bryan Newbold | 2018-04-06 | 2 | -7/+10 | |
* | try pig env again | Bryan Newbold | 2018-04-06 | 2 | -2/+4 | |
* | use IA mirror for pig download | Bryan Newbold | 2018-04-06 | 1 | -1/+2 | |
* | lint fixes | Bryan Newbold | 2018-04-06 | 6 | -19/+11 | |
* | fetch deps in pig script | Bryan Newbold | 2018-04-06 | 1 | -0/+1 | |
* | show coverage | Bryan Newbold | 2018-04-06 | 1 | -1/+1 | |
* | renamed do_tei | Bryan Newbold | 2018-04-06 | 1 | -3/+3 | |
* | switch to newer test image | Bryan Newbold | 2018-04-06 | 1 | -1/+1 | |
* | temporarily skip pylint on extraction | Bryan Newbold | 2018-04-06 | 1 | -0/+3 | |
* | add pylint to CI | Bryan Newbold | 2018-04-06 | 5 | -41/+123 | |
* | iterate gitlab-ci.yml | Bryan Newbold | 2018-04-06 | 1 | -3/+5 | |
* | add test for grobid2json | Bryan Newbold | 2018-04-06 | 1 | -0/+14 | |
* | coverage defaults | Bryan Newbold | 2018-04-06 | 1 | -0/+3 | |
* | gitlab test script | Bryan Newbold | 2018-04-06 | 2 | -2/+20 | |
* | small grobid2json test | Bryan Newbold | 2018-04-06 | 4 | -2/+164 | |
* | make happybase mock injection slightly less horrible | Bryan Newbold | 2018-04-05 | 4 | -36/+31 | |
* | progress on extractor | Bryan Newbold | 2018-04-05 | 3 | -56/+93 | |
* | improve test coverage | Bryan Newbold | 2018-04-05 | 6 | -6/+39 | |
* | test coverage info | Bryan Newbold | 2018-04-05 | 4 | -7/+67 | |
* | README/TODO updates | Bryan Newbold | 2018-04-04 | 3 | -9/+20 | |
* | refactor out some common code | Bryan Newbold | 2018-04-04 | 5 | -184/+133 | |
* | extraction -> mapreduce | Bryan Newbold | 2018-04-04 | 14 | -0/+0 | |
* | merge backfill into extraction directory | Bryan Newbold | 2018-04-04 | 11 | -653/+27 | |
* | pep8 | Bryan Newbold | 2018-04-04 | 2 | -3/+3 | |
* | testing stuff as dev deps | Bryan Newbold | 2018-04-04 | 2 | -73/+109 | |
* | more testing deps | Bryan Newbold | 2018-04-04 | 2 | -8/+133 | |
* | trivial whitespace | Bryan Newbold | 2018-04-04 | 2 | -1/+2 | |
* | more TODO | Bryan Newbold | 2018-04-04 | 2 | -0/+17 | |
* | more WIP on extractor | Bryan Newbold | 2018-04-04 | 5 | -52/+427 | |
* | add example XML output (open access) | Bryan Newbold | 2018-04-03 | 1 | -0/+2004 | |
* | WIP on extractor-with-mrjob | Bryan Newbold | 2018-04-03 | 4 | -0/+954 | |
* | fix very important typo | Bryan Newbold | 2018-04-03 | 1 | -1/+1 | |
* | shift docs around a bit | Bryan Newbold | 2018-04-03 | 2 | -9/+12 | |
* | actually running hadoop job on cluster | Bryan Newbold | 2018-04-03 | 2 | -0/+18 | |
* | fix silly bugs in backfiller (need more tests) | Bryan Newbold | 2018-04-03 | 1 | -3/+4 | |
* | add setuptools (can probably remove) | Bryan Newbold | 2018-04-03 | 2 | -7/+8 | |
* | heritrix expects ints, not strings, for numbers | Bryan Newbold | 2018-04-02 | 1 | -7/+7 | |
* | backfill: sha1 prefix, cluster example | Bryan Newbold | 2018-03-30 | 3 | -8/+19 | |
* | clean up backfill code/tests | Bryan Newbold | 2018-03-30 | 2 | -24/+42 | |
* | refactor backfill for mrjob | Bryan Newbold | 2018-03-30 | 4 | -64/+145 |