aboutsummaryrefslogtreecommitdiffstats
path: root/mapreduce
Commit message (Expand)AuthorAgeFilesLines
* actually fix oversize insertsBryan Newbold2018-05-081-7/+7
* XML size limitBryan Newbold2018-04-261-0/+6
* force_existing flag for extractionBryan Newbold2018-04-191-1/+5
* NLineInputFormat requires RawProtocolBryan Newbold2018-04-191-1/+2
* local mrjob configBryan Newbold2018-04-191-0/+6
* switch to new (local) sentry instanceBryan Newbold2018-04-181-1/+1
* update Pipfile.lock (new pluggy)Bryan Newbold2018-04-161-59/+64
* use NLineInputFormat so we can control split sizeBryan Newbold2018-04-111-0/+1
* revert PYTHONPATH in cmdenvBryan Newbold2018-04-111-1/+2
* Merge branch 'bnewbold-sentry'Bryan Newbold2018-04-104-19/+31
|\
| * prototype sentry integrationBryan Newbold2018-04-104-19/+31
* | don't try to decode GROBID outputBryan Newbold2018-04-111-2/+2
|/
* partially lint extraction_cdx_grobid.pyBryan Newbold2018-04-101-8/+6
* yet more test improvementsBryan Newbold2018-04-102-9/+61
* cleanup tests; add one for double-processingBryan Newbold2018-04-102-20/+43
* TODO updatesBryan Newbold2018-04-102-11/+2
* wayback 404 testBryan Newbold2018-04-102-5/+49
* extraction test fixesBryan Newbold2018-04-102-27/+50
* grobid2json test fixesBryan Newbold2018-04-102-1/+3
* failing tests!Bryan Newbold2018-04-102-16/+51
* configs and README updatesBryan Newbold2018-04-073-5/+26
* bug fixesBryan Newbold2018-04-061-7/+14
* updates to runningBryan Newbold2018-04-061-5/+14
* lint fixesBryan Newbold2018-04-065-18/+10
* renamed do_teiBryan Newbold2018-04-061-3/+3
* temporarily skip pylint on extractionBryan Newbold2018-04-061-0/+3
* add pylint to CIBryan Newbold2018-04-063-39/+120
* add test for grobid2jsonBryan Newbold2018-04-061-0/+14
* coverage defaultsBryan Newbold2018-04-061-0/+3
* small grobid2json testBryan Newbold2018-04-064-2/+164
* make happybase mock injection slightly less horribleBryan Newbold2018-04-054-36/+31
* progress on extractorBryan Newbold2018-04-053-56/+93
* improve test coverageBryan Newbold2018-04-056-6/+39
* test coverage infoBryan Newbold2018-04-053-7/+62
* README/TODO updatesBryan Newbold2018-04-041-8/+9
* refactor out some common codeBryan Newbold2018-04-045-184/+133
* extraction -> mapreduceBryan Newbold2018-04-0414-0/+3930
* move to top levelBryan Newbold2018-03-293-103/+0
* import vinay's cdx-record-pipelineBryan Newbold2018-03-293-0/+103