Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Added job and test for counting mime types. | Ellen Spertus | 2018-06-06 | 2 | -0/+96 |
| | |||||
* | Made package names match directory names. Cleaned up imports. | Ellen Spertus | 2018-06-05 | 4 | -16/+13 |
| | |||||
* | Merge branch 'refactoring' into 'master' | bnewbold | 2018-06-04 | 4 | -20/+101 |
|\ | | | | | | | | | Refactoring to add, use, and test class HBaseBuilder to eliminate duplicated code and facilitate HBaseSource creation See merge request webgroup/sandcrawler!1 | ||||
| * | Made changes suggested in merge request review. | Ellen Spertus | 2018-06-04 | 3 | -15/+10 |
| | | | | | | | | | | - Changed inverseSchema from Map to List, eliminating incorrect comment. - Changing format of argument to HBaseBuilder.build from String to List[String]. | ||||
| * | Changed interface to HBaseBuilder.parseColSpec. | Ellen Spertus | 2018-06-03 | 3 | -8/+12 |
| | | |||||
| * | Added HBaseBuilder.build() and had HBaseRowCountJob call it. | Ellen Spertus | 2018-06-03 | 2 | -11/+5 |
| | | |||||
| * | Added HBaseBuilder.parseColSpecs and tests, which pass. | Ellen Spertus | 2018-06-03 | 2 | -0/+92 |
| | | |||||
| * | Factored common code out of HBaseRowCountJob and its test into a new ↵ | Ellen Spertus | 2018-06-01 | 2 | -16/+12 |
| | | | | | | | | companion object. | ||||
* | | Merge branch 'bnewbold-scala-build-fixes' into 'master' | bnewbold | 2018-06-04 | 3 | -21/+19 |
|\ \ | | | | | | | | | | | | | scala build fixes See merge request webgroup/sandcrawler!2 | ||||
| * | | try to run scala tests in gitlab CI | Bryan Newbold | 2018-06-04 | 1 | -2/+12 |
| | | | |||||
| * | | fetch SpyGlass jar from archive.org (not local) | Bryan Newbold | 2018-06-04 | 2 | -19/+7 |
| |/ | |||||
* / | bnewbold-dev > wbgrp-svc263 | Bryan Newbold | 2018-06-04 | 1 | -4/+4 |
|/ | | | | This is a new production VM running an HBase-Thrift gateway | ||||
* | Provided full path to cascading jar in command line. | Ellen Spertus | 2018-05-31 | 1 | -1/+1 |
| | |||||
* | Added tip on OutOfMemoryError. | Ellen Spertus | 2018-05-31 | 1 | -1/+5 |
| | |||||
* | Added debugging info for cascading.tuple.Fields. | Ellen Spertus | 2018-05-31 | 1 | -1/+23 |
| | |||||
* | switch HBaseRowCountJob to SCAN_ALL | Bryan Newbold | 2018-05-29 | 2 | -5/+11 |
| | |||||
* | HBaseRowCountJob actually counts rows | Bryan Newbold | 2018-05-29 | 2 | -13/+8 |
| | |||||
* | update version and project name | Bryan Newbold | 2018-05-24 | 3 | -4/+6 |
| | |||||
* | cleanup scalding notes/README | Bryan Newbold | 2018-05-24 | 3 | -37/+162 |
| | |||||
* | assemblyMergeStrategy deprecation warning | Bryan Newbold | 2018-05-24 | 1 | -2/+2 |
| | |||||
* | rename jvm/scalding directories | Bryan Newbold | 2018-05-24 | 14 | -71/+0 |
| | |||||
* | fix up HBaseRowCountTest | Bryan Newbold | 2018-05-24 | 2 | -7/+15 |
| | | | | | Again, seems like test fixture must match *exactly* or very obscure errors crop up. | ||||
* | get quorum fields to match, fixing test | Bryan Newbold | 2018-05-24 | 1 | -1/+1 |
| | | | | | | | | | | | Writing this commit message in anger: It seems that the HBaseSource must match exactly between the instantiated Job class and the JobTest. The error when this isn't the case is very obscure: a `None.get()` exception deep in SpyGlass internals. Blech. This may or may not explain other test failure issues. | ||||
* | Added repository to find com.hadoop.gplcompression#hadoop-lzo;0.4.16. | Ellen Spertus | 2018-05-22 | 1 | -0/+1 |
| | |||||
* | more tests (failing) | Bryan Newbold | 2018-05-22 | 2 | -1/+56 |
| | |||||
* | update README with invocations | Bryan Newbold | 2018-05-21 | 1 | -0/+13 |
| | |||||
* | point SimpleHBaseSourceExample to actual zookeeper quorum host | Bryan Newbold | 2018-05-21 | 1 | -1/+2 |
| | |||||
* | another attempt at a simple job variation | Bryan Newbold | 2018-05-21 | 1 | -3/+16 |
| | |||||
* | update HBaseRowCountJob based on Simple example | Bryan Newbold | 2018-05-21 | 1 | -10/+11 |
| | |||||
* | spyglass/hbase test examples (from upstream) | Bryan Newbold | 2018-05-21 | 2 | -0/+93 |
| | |||||
* | deps updates: cdh libs, hbase, custom spyglass | Bryan Newbold | 2018-05-21 | 2 | -3/+7 |
| | |||||
* | docs of how to munge around custom spyglass jars | Bryan Newbold | 2018-05-21 | 1 | -0/+19 |
| | |||||
* | add dependencyTree helper plugin | Bryan Newbold | 2018-05-21 | 2 | -1/+2 |
| | |||||
* | building (but nullpointer) spyglass integration | Bryan Newbold | 2018-05-21 | 2 | -3/+27 |
| | |||||
* | more deps locations | Bryan Newbold | 2018-05-21 | 1 | -0/+8 |
| | |||||
* | gitignore for scalding directory | Bryan Newbold | 2018-05-21 | 1 | -0/+3 |
| | |||||
* | fix WordCountJob package; tests; hadoop version | Bryan Newbold | 2018-05-21 | 3 | -2/+43 |
| | | | | | | | | | | | When copying from upstream scalding, forgot to change the path/namespace of the WordCountJob. Production IA cluster is actually running Hadoop 2.5, not 2.6 (I keep forgetting). Pull in more dependencies so test runs (copied from scalding repo, only changed the namespace of the job) | ||||
* | WordCount -> WordCountJob | Bryan Newbold | 2018-05-21 | 3 | -13/+13 |
| | | | | Also use the exact file from scalding repo | ||||
* | success running with com.twitter.scalding.Tool | Bryan Newbold | 2018-05-21 | 2 | -4/+11 |
| | |||||
* | remove main function; class name same as file | Bryan Newbold | 2018-05-21 | 1 | -12/+1 |
| | |||||
* | copy in jvm ecosystem notes | Bryan Newbold | 2018-05-21 | 1 | -0/+46 |
| | |||||
* | copy in scalding learning example | Bryan Newbold | 2018-05-21 | 6 | -0/+93 |
| | |||||
* | jvm/scala/scalding setup notes | Bryan Newbold | 2018-05-17 | 1 | -0/+16 |
| | |||||
* | fix tests post-DISTINCT | Bryan Newbold | 2018-05-08 | 5 | -25/+30 |
| | | | | Confirms it's working! | ||||
* | distinct on SHA1 in cdx scripts | Bryan Newbold | 2018-05-08 | 2 | -6/+18 |
| | |||||
* | pig cdx join improvements | Bryan Newbold | 2018-05-08 | 1 | -1/+1 |
| | |||||
* | how to run pig in production | Bryan Newbold | 2018-05-08 | 1 | -0/+5 |
| | |||||
* | WIP on filter-cdx-join-urls.pig | Bryan Newbold | 2018-05-07 | 1 | -0/+37 |
| | |||||
* | Merge branch 'master' of git.archive.org:webgroup/sandcrawler | Bryan Newbold | 2018-05-08 | 8 | -3/+139 |
|\ | |||||
| * | stale TODO | Bryan Newbold | 2018-05-07 | 1 | -0/+1 |
| | |