aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* progress on extractorBryan Newbold2018-04-053-56/+93
|
* improve test coverageBryan Newbold2018-04-056-6/+39
|
* test coverage infoBryan Newbold2018-04-054-7/+67
|
* README/TODO updatesBryan Newbold2018-04-043-9/+20
|
* refactor out some common codeBryan Newbold2018-04-045-184/+133
|
* extraction -> mapreduceBryan Newbold2018-04-0414-0/+0
|
* merge backfill into extraction directoryBryan Newbold2018-04-0411-653/+27
|
* pep8Bryan Newbold2018-04-042-3/+3
|
* testing stuff as dev depsBryan Newbold2018-04-042-73/+109
|
* more testing depsBryan Newbold2018-04-042-8/+133
|
* trivial whitespaceBryan Newbold2018-04-042-1/+2
|
* more TODOBryan Newbold2018-04-042-0/+17
|
* more WIP on extractorBryan Newbold2018-04-045-52/+427
|
* add example XML output (open access)Bryan Newbold2018-04-031-0/+2004
|
* WIP on extractor-with-mrjobBryan Newbold2018-04-034-0/+954
|
* fix very important typoBryan Newbold2018-04-031-1/+1
|
* shift docs around a bitBryan Newbold2018-04-032-9/+12
|
* actually running hadoop job on clusterBryan Newbold2018-04-032-0/+18
|
* fix silly bugs in backfiller (need more tests)Bryan Newbold2018-04-031-3/+4
|
* add setuptools (can probably remove)Bryan Newbold2018-04-032-7/+8
|
* heritrix expects ints, not strings, for numbersBryan Newbold2018-04-021-7/+7
|
* backfill: sha1 prefix, cluster exampleBryan Newbold2018-03-303-8/+19
|
* clean up backfill code/testsBryan Newbold2018-03-302-24/+42
|
* refactor backfill for mrjobBryan Newbold2018-03-304-64/+145
|
* pytest helpersBryan Newbold2018-03-304-32/+564
|
* clean up pig test stuffBryan Newbold2018-03-306-62/+71
|
* renamesBryan Newbold2018-03-304-0/+129
|
* basically working pig testBryan Newbold2018-03-295-23/+32
|
* progress on pig testsBryan Newbold2018-03-298-10/+127
|
* import WIP on pig test setupBryan Newbold2018-03-296-0/+156
|
* WIP on cdx backfillBryan Newbold2018-03-293-0/+265
|
* move to top levelBryan Newbold2018-03-293-0/+0
|
* sandcrawlerBryan Newbold2018-03-291-2/+9
|
* no venvsBryan Newbold2018-03-291-0/+1
|
* import vinay's cdx-record-pipelineBryan Newbold2018-03-293-0/+103
|
* init repoBryan Newbold2018-03-292-0/+30