| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | WIP on extractor-with-mrjob | Bryan Newbold | 2018-04-03 | 4 | -0/+954 |
| | | |||||
| * | fix very important typo | Bryan Newbold | 2018-04-03 | 1 | -1/+1 |
| | | |||||
| * | shift docs around a bit | Bryan Newbold | 2018-04-03 | 2 | -9/+12 |
| | | |||||
| * | actually running hadoop job on cluster | Bryan Newbold | 2018-04-03 | 2 | -0/+18 |
| | | |||||
| * | fix silly bugs in backfiller (need more tests) | Bryan Newbold | 2018-04-03 | 1 | -3/+4 |
| | | |||||
| * | add setuptools (can probably remove) | Bryan Newbold | 2018-04-03 | 2 | -7/+8 |
| | | |||||
| * | heritrix expects ints, not strings, for numbers | Bryan Newbold | 2018-04-02 | 1 | -7/+7 |
| | | |||||
| * | backfill: sha1 prefix, cluster example | Bryan Newbold | 2018-03-30 | 3 | -8/+19 |
| | | |||||
| * | clean up backfill code/tests | Bryan Newbold | 2018-03-30 | 2 | -24/+42 |
| | | |||||
| * | refactor backfill for mrjob | Bryan Newbold | 2018-03-30 | 4 | -64/+145 |
| | | |||||
| * | pytest helpers | Bryan Newbold | 2018-03-30 | 4 | -32/+564 |
| | | |||||
| * | clean up pig test stuff | Bryan Newbold | 2018-03-30 | 6 | -62/+71 |
| | | |||||
| * | renames | Bryan Newbold | 2018-03-30 | 4 | -0/+129 |
| | | |||||
| * | basically working pig test | Bryan Newbold | 2018-03-29 | 5 | -23/+32 |
| | | |||||
| * | progress on pig tests | Bryan Newbold | 2018-03-29 | 8 | -10/+127 |
| | | |||||
| * | import WIP on pig test setup | Bryan Newbold | 2018-03-29 | 6 | -0/+156 |
| | | |||||
| * | WIP on cdx backfill | Bryan Newbold | 2018-03-29 | 3 | -0/+265 |
| | | |||||
| * | move to top level | Bryan Newbold | 2018-03-29 | 3 | -0/+0 |
| | | |||||
| * | sandcrawler | Bryan Newbold | 2018-03-29 | 1 | -2/+9 |
| | | |||||
| * | no venvs | Bryan Newbold | 2018-03-29 | 1 | -0/+1 |
| | | |||||
| * | import vinay's cdx-record-pipeline | Bryan Newbold | 2018-03-29 | 3 | -0/+103 |
| | | |||||
| * | init repo | Bryan Newbold | 2018-03-29 | 2 | -0/+30 |
