aboutsummaryrefslogtreecommitdiffstats
path: root/scalding
diff options
context:
space:
mode:
Diffstat (limited to 'scalding')
-rw-r--r--scalding/README.md22
-rw-r--r--scalding/ia_cluster.conf0
-rw-r--r--scalding/scalding-background.md7
3 files changed, 19 insertions, 10 deletions
diff --git a/scalding/README.md b/scalding/README.md
index c40da5c..45b62d0 100644
--- a/scalding/README.md
+++ b/scalding/README.md
@@ -3,12 +3,19 @@ the JVM) using the Scalding framework.
See the other markdown files in this directory for more background and tips.
-## Building and Running
+## Dependencies
Locally, you need to have the JVM (eg, OpenJDK 1.8), `sbt` build tool, and
might need (exactly) Scala version 2.11.8.
-See section below on building and installing custom SpyGlass jar.
+On a debian/ubuntu machine:
+
+ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
+ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
+ sudo apt-get update
+ sudo apt install scala sbt
+
+## Building and Running
Run tests:
@@ -26,17 +33,12 @@ Run on cluster:
com.twitter.scalding.Tool sandcrawler.HBaseRowCountJob --hdfs \
--app.conf.path thing.conf \
--output hdfs:///user/bnewbold/spyglass_out_test
-
+
+## Troubleshooting
+
If your `sbt` task fails with this error:
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Metaspace
try restarting `sbt` with more memory (e.g., `sbt -mem 2048`).
-## SpyGlass Jar
-
-SpyGlass is a "scalding-to-HBase" connector. It isn't maintained, so we needed
-to rebuild to support our versions of HBase/scalding/etc. Our fork (including
-build instructions) is at <https://github.com/bnewbold/SpyGlass>
-(`bnewbold-scala2.11` branch); compiled .jar files are available from
-<https://archive.org/download/ia_sandcrawler_maven2>.
diff --git a/scalding/ia_cluster.conf b/scalding/ia_cluster.conf
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/scalding/ia_cluster.conf
diff --git a/scalding/scalding-background.md b/scalding/scalding-background.md
index 99b363a..4d62c7e 100644
--- a/scalding/scalding-background.md
+++ b/scalding/scalding-background.md
@@ -1,4 +1,11 @@
+## Why Scalding
+
+Scalding vs. Java (MapReduce) vs. Java (Cascading) vs. Scoobi vs. Scrunch:
+
+- <https://speakerdeck.com/agemooij/why-hadoop-mapreduce-needs-scala?slide=34>
+- <https://github.com/twitter/scalding/wiki/Comparison-to-Scrunch-and-Scoobi>
+
## Tips/Gotchas
`.scala` file names should match internal classes.