| Commit message (Expand) | Author | Age | Files | Lines | ||
|---|---|---|---|---|---|---|
| ... | ||||||
| * | fix up HBaseRowCountTest | Bryan Newbold | 2018-05-24 | 2 | -7/+15 | |
| * | get quorum fields to match, fixing test | Bryan Newbold | 2018-05-24 | 1 | -1/+1 | |
| * | Added repository to find com.hadoop.gplcompression#hadoop-lzo;0.4.16. | Ellen Spertus | 2018-05-22 | 1 | -0/+1 | |
| * | more tests (failing) | Bryan Newbold | 2018-05-22 | 2 | -1/+56 | |
| * | update README with invocations | Bryan Newbold | 2018-05-21 | 1 | -0/+13 | |
| * | point SimpleHBaseSourceExample to actual zookeeper quorum host | Bryan Newbold | 2018-05-21 | 1 | -1/+2 | |
| * | another attempt at a simple job variation | Bryan Newbold | 2018-05-21 | 1 | -3/+16 | |
| * | update HBaseRowCountJob based on Simple example | Bryan Newbold | 2018-05-21 | 1 | -10/+11 | |
| * | spyglass/hbase test examples (from upstream) | Bryan Newbold | 2018-05-21 | 2 | -0/+93 | |
| * | deps updates: cdh libs, hbase, custom spyglass | Bryan Newbold | 2018-05-21 | 2 | -3/+7 | |
| * | docs of how to munge around custom spyglass jars | Bryan Newbold | 2018-05-21 | 1 | -0/+19 | |
| * | add dependencyTree helper plugin | Bryan Newbold | 2018-05-21 | 2 | -1/+2 | |
| * | building (but nullpointer) spyglass integration | Bryan Newbold | 2018-05-21 | 2 | -3/+27 | |
| * | more deps locations | Bryan Newbold | 2018-05-21 | 1 | -0/+8 | |
| * | gitignore for scalding directory | Bryan Newbold | 2018-05-21 | 1 | -0/+3 | |
| * | fix WordCountJob package; tests; hadoop version | Bryan Newbold | 2018-05-21 | 3 | -2/+43 | |
| * | WordCount -> WordCountJob | Bryan Newbold | 2018-05-21 | 3 | -13/+13 | |
| * | success running with com.twitter.scalding.Tool | Bryan Newbold | 2018-05-21 | 2 | -4/+11 | |
| * | remove main function; class name same as file | Bryan Newbold | 2018-05-21 | 1 | -12/+1 | |
| * | copy in jvm ecosystem notes | Bryan Newbold | 2018-05-21 | 1 | -0/+46 | |
| * | copy in scalding learning example | Bryan Newbold | 2018-05-21 | 6 | -0/+93 | |
| * | jvm/scala/scalding setup notes | Bryan Newbold | 2018-05-17 | 1 | -0/+16 | |
| * | fix tests post-DISTINCT | Bryan Newbold | 2018-05-08 | 5 | -25/+30 | |
| * | distinct on SHA1 in cdx scripts | Bryan Newbold | 2018-05-08 | 2 | -6/+18 | |
| * | pig cdx join improvements | Bryan Newbold | 2018-05-08 | 1 | -1/+1 | |
| * | how to run pig in production | Bryan Newbold | 2018-05-08 | 1 | -0/+5 | |
| * | WIP on filter-cdx-join-urls.pig | Bryan Newbold | 2018-05-07 | 1 | -0/+37 | |
| * | Merge branch 'master' of git.archive.org:webgroup/sandcrawler | Bryan Newbold | 2018-05-08 | 8 | -3/+139 | |
| |\ | ||||||
| | * | stale TODO | Bryan Newbold | 2018-05-07 | 1 | -0/+1 | |
| | * | pig script to filter GWB CDX by SURT regexes | Bryan Newbold | 2018-05-07 | 6 | -0/+127 | |
| | * | improve pig helper | Bryan Newbold | 2018-05-07 | 1 | -3/+11 | |
| * | | actually fix oversize inserts | Bryan Newbold | 2018-05-08 | 1 | -7/+7 | |
| |/ | ||||||
| * | XML size limit | Bryan Newbold | 2018-04-26 | 1 | -0/+6 | |
| * | force_existing flag for extraction | Bryan Newbold | 2018-04-19 | 1 | -1/+5 | |
| * | NLineInputFormat requires RawProtocol | Bryan Newbold | 2018-04-19 | 1 | -1/+2 | |
| * | local mrjob config | Bryan Newbold | 2018-04-19 | 1 | -0/+6 | |
| * | switch to new (local) sentry instance | Bryan Newbold | 2018-04-18 | 1 | -1/+1 | |
| * | notes on attempted vinay setup | Bryan Newbold | 2018-04-18 | 2 | -1/+9 | |
| * | start adding macOS instructions | Bryan Newbold | 2018-04-16 | 1 | -0/+4 | |
| * | update Pipfile.lock (new pluggy) | Bryan Newbold | 2018-04-16 | 1 | -59/+64 | |
| * | use NLineInputFormat so we can control split size | Bryan Newbold | 2018-04-11 | 1 | -0/+1 | |
| * | revert PYTHONPATH in cmdenv | Bryan Newbold | 2018-04-11 | 1 | -1/+2 | |
| * | Merge branch 'bnewbold-sentry' | Bryan Newbold | 2018-04-10 | 4 | -19/+31 | |
| |\ | ||||||
| | * | prototype sentry integration | Bryan Newbold | 2018-04-10 | 4 | -19/+31 | |
| * | | don't try to decode GROBID output | Bryan Newbold | 2018-04-11 | 1 | -2/+2 | |
| |/ | ||||||
| * | partially lint extraction_cdx_grobid.py | Bryan Newbold | 2018-04-10 | 1 | -8/+6 | |
| * | yet more test improvements | Bryan Newbold | 2018-04-10 | 2 | -9/+61 | |
| * | cleanup tests; add one for double-processing | Bryan Newbold | 2018-04-10 | 2 | -20/+43 | |
| * | TODO updates | Bryan Newbold | 2018-04-10 | 3 | -18/+3 | |
| * | wayback 404 test | Bryan Newbold | 2018-04-10 | 2 | -5/+49 | |
