| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | actually fix oversize inserts | Bryan Newbold | 2018-05-08 | 1 | -7/+7 |
| | | |||||
| * | XML size limit | Bryan Newbold | 2018-04-26 | 1 | -0/+6 |
| | | |||||
| * | force_existing flag for extraction | Bryan Newbold | 2018-04-19 | 1 | -1/+5 |
| | | |||||
| * | NLineInputFormat requires RawProtocol | Bryan Newbold | 2018-04-19 | 1 | -1/+2 |
| | | | | | | Should make this a command line argument or something. Want one in HADOOP, the other for local/tests/inline/etc. | ||||
| * | local mrjob config | Bryan Newbold | 2018-04-19 | 1 | -0/+6 |
| | | |||||
| * | switch to new (local) sentry instance | Bryan Newbold | 2018-04-18 | 1 | -1/+1 |
| | | |||||
| * | update Pipfile.lock (new pluggy) | Bryan Newbold | 2018-04-16 | 1 | -59/+64 |
| | | |||||
| * | use NLineInputFormat so we can control split size | Bryan Newbold | 2018-04-11 | 1 | -0/+1 |
| | | |||||
| * | revert PYTHONPATH in cmdenv | Bryan Newbold | 2018-04-11 | 1 | -1/+2 |
| | | | | | Seemed to break hadoop jobs for some reason | ||||
| * | Merge branch 'bnewbold-sentry' | Bryan Newbold | 2018-04-10 | 4 | -19/+31 |
| |\ | |||||
| | * | prototype sentry integration | Bryan Newbold | 2018-04-10 | 4 | -19/+31 |
| | | | |||||
| * | | don't try to decode GROBID output | Bryan Newbold | 2018-04-11 | 1 | -2/+2 |
| |/ | |||||
| * | partially lint extraction_cdx_grobid.py | Bryan Newbold | 2018-04-10 | 1 | -8/+6 |
| | | |||||
| * | yet more test improvements | Bryan Newbold | 2018-04-10 | 2 | -9/+61 |
| | | |||||
| * | cleanup tests; add one for double-processing | Bryan Newbold | 2018-04-10 | 2 | -20/+43 |
| | | |||||
| * | TODO updates | Bryan Newbold | 2018-04-10 | 2 | -11/+2 |
| | | |||||
| * | wayback 404 test | Bryan Newbold | 2018-04-10 | 2 | -5/+49 |
| | | |||||
| * | extraction test fixes | Bryan Newbold | 2018-04-10 | 2 | -27/+50 |
| | | |||||
| * | grobid2json test fixes | Bryan Newbold | 2018-04-10 | 2 | -1/+3 |
| | | |||||
| * | failing tests! | Bryan Newbold | 2018-04-10 | 2 | -16/+51 |
| | | |||||
| * | configs and README updates | Bryan Newbold | 2018-04-07 | 3 | -5/+26 |
| | | |||||
| * | bug fixes | Bryan Newbold | 2018-04-06 | 1 | -7/+14 |
| | | |||||
| * | updates to running | Bryan Newbold | 2018-04-06 | 1 | -5/+14 |
| | | |||||
| * | lint fixes | Bryan Newbold | 2018-04-06 | 5 | -18/+10 |
| | | |||||
| * | renamed do_tei | Bryan Newbold | 2018-04-06 | 1 | -3/+3 |
| | | |||||
| * | temporarily skip pylint on extraction | Bryan Newbold | 2018-04-06 | 1 | -0/+3 |
| | | |||||
| * | add pylint to CI | Bryan Newbold | 2018-04-06 | 3 | -39/+120 |
| | | |||||
| * | add test for grobid2json | Bryan Newbold | 2018-04-06 | 1 | -0/+14 |
| | | |||||
| * | coverage defaults | Bryan Newbold | 2018-04-06 | 1 | -0/+3 |
| | | |||||
| * | small grobid2json test | Bryan Newbold | 2018-04-06 | 4 | -2/+164 |
| | | |||||
| * | make happybase mock injection slightly less horrible | Bryan Newbold | 2018-04-05 | 4 | -36/+31 |
| | | |||||
| * | progress on extractor | Bryan Newbold | 2018-04-05 | 3 | -56/+93 |
| | | |||||
| * | improve test coverage | Bryan Newbold | 2018-04-05 | 6 | -6/+39 |
| | | |||||
| * | test coverage info | Bryan Newbold | 2018-04-05 | 3 | -7/+62 |
| | | |||||
| * | README/TODO updates | Bryan Newbold | 2018-04-04 | 1 | -8/+9 |
| | | |||||
| * | refactor out some common code | Bryan Newbold | 2018-04-04 | 5 | -184/+133 |
| | | |||||
| * | extraction -> mapreduce | Bryan Newbold | 2018-04-04 | 14 | -0/+3930 |
| | | |||||
| * | move to top level | Bryan Newbold | 2018-03-29 | 3 | -103/+0 |
| | | |||||
| * | import vinay's cdx-record-pipeline | Bryan Newbold | 2018-03-29 | 3 | -0/+103 |
