aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* more WIP on extractorBryan Newbold2018-04-045-52/+427
|
* add example XML output (open access)Bryan Newbold2018-04-031-0/+2004
|
* WIP on extractor-with-mrjobBryan Newbold2018-04-034-0/+954
|
* fix very important typoBryan Newbold2018-04-031-1/+1
|
* shift docs around a bitBryan Newbold2018-04-032-9/+12
|
* actually running hadoop job on clusterBryan Newbold2018-04-032-0/+18
|
* fix silly bugs in backfiller (need more tests)Bryan Newbold2018-04-031-3/+4
|
* add setuptools (can probably remove)Bryan Newbold2018-04-032-7/+8
|
* heritrix expects ints, not strings, for numbersBryan Newbold2018-04-021-7/+7
|
* backfill: sha1 prefix, cluster exampleBryan Newbold2018-03-303-8/+19
|
* clean up backfill code/testsBryan Newbold2018-03-302-24/+42
|
* refactor backfill for mrjobBryan Newbold2018-03-304-64/+145
|
* pytest helpersBryan Newbold2018-03-304-32/+564
|
* clean up pig test stuffBryan Newbold2018-03-306-62/+71
|
* renamesBryan Newbold2018-03-304-0/+129
|
* basically working pig testBryan Newbold2018-03-295-23/+32
|
* progress on pig testsBryan Newbold2018-03-298-10/+127
|
* import WIP on pig test setupBryan Newbold2018-03-296-0/+156
|
* WIP on cdx backfillBryan Newbold2018-03-293-0/+265
|
* move to top levelBryan Newbold2018-03-293-0/+0
|
* sandcrawlerBryan Newbold2018-03-291-2/+9
|
* no venvsBryan Newbold2018-03-291-0/+1
|
* import vinay's cdx-record-pipelineBryan Newbold2018-03-293-0/+103
|
* init repoBryan Newbold2018-03-292-0/+30