aboutsummaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
...
* README/TODO updatesBryan Newbold2018-04-043-9/+20
* refactor out some common codeBryan Newbold2018-04-045-184/+133
* extraction -> mapreduceBryan Newbold2018-04-0414-0/+0
* merge backfill into extraction directoryBryan Newbold2018-04-0411-653/+27
* pep8Bryan Newbold2018-04-042-3/+3
* testing stuff as dev depsBryan Newbold2018-04-042-73/+109
* more testing depsBryan Newbold2018-04-042-8/+133
* trivial whitespaceBryan Newbold2018-04-042-1/+2
* more TODOBryan Newbold2018-04-042-0/+17
* more WIP on extractorBryan Newbold2018-04-045-52/+427
* add example XML output (open access)Bryan Newbold2018-04-031-0/+2004
* WIP on extractor-with-mrjobBryan Newbold2018-04-034-0/+954
* fix very important typoBryan Newbold2018-04-031-1/+1
* shift docs around a bitBryan Newbold2018-04-032-9/+12
* actually running hadoop job on clusterBryan Newbold2018-04-032-0/+18
* fix silly bugs in backfiller (need more tests)Bryan Newbold2018-04-031-3/+4
* add setuptools (can probably remove)Bryan Newbold2018-04-032-7/+8
* heritrix expects ints, not strings, for numbersBryan Newbold2018-04-021-7/+7
* backfill: sha1 prefix, cluster exampleBryan Newbold2018-03-303-8/+19
* clean up backfill code/testsBryan Newbold2018-03-302-24/+42
* refactor backfill for mrjobBryan Newbold2018-03-304-64/+145
* pytest helpersBryan Newbold2018-03-304-32/+564
* clean up pig test stuffBryan Newbold2018-03-306-62/+71
* renamesBryan Newbold2018-03-304-0/+129
* basically working pig testBryan Newbold2018-03-295-23/+32
* progress on pig testsBryan Newbold2018-03-298-10/+127
* import WIP on pig test setupBryan Newbold2018-03-296-0/+156
* WIP on cdx backfillBryan Newbold2018-03-293-0/+265
* move to top levelBryan Newbold2018-03-293-0/+0
* sandcrawlerBryan Newbold2018-03-291-2/+9
* no venvsBryan Newbold2018-03-291-0/+1
* import vinay's cdx-record-pipelineBryan Newbold2018-03-293-0/+103
* init repoBryan Newbold2018-03-292-0/+30