diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-05-23 12:27:59 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-05-24 00:02:36 -0700 |
commit | 4ba428db30593b67283dd90b92141f99840dc78e (patch) | |
tree | f63c8e146e7f90a530abfebdb993ab45d57426d5 | |
parent | 29e4a83ff76da07bc6ad5d3f49d746ee0bc72023 (diff) | |
download | sandcrawler-4ba428db30593b67283dd90b92141f99840dc78e.tar.gz sandcrawler-4ba428db30593b67283dd90b92141f99840dc78e.zip |
rename jvm/scalding directories
-rw-r--r-- | jvm-mapreduce/TODO | 16 | ||||
-rw-r--r-- | jvm-mapreduce/learning.txt | 55 | ||||
-rw-r--r-- | scalding/.gitignore (renamed from scald-mvp/.gitignore) | 0 | ||||
-rw-r--r-- | scalding/README.md (renamed from scald-mvp/README.md) | 0 | ||||
-rw-r--r-- | scalding/build.sbt (renamed from scald-mvp/build.sbt) | 0 | ||||
-rw-r--r-- | scalding/project/Dependencies.scala (renamed from scald-mvp/project/Dependencies.scala) | 0 | ||||
-rw-r--r-- | scalding/project/build.properties (renamed from scald-mvp/project/build.properties) | 0 | ||||
-rw-r--r-- | scalding/project/plugins.sbt (renamed from scald-mvp/project/plugins.sbt) | 0 | ||||
-rw-r--r-- | scalding/src/main/scala/example/SimpleHBaseSourceExample.scala (renamed from scald-mvp/src/main/scala/example/SimpleHBaseSourceExample.scala) | 0 | ||||
-rw-r--r-- | scalding/src/main/scala/example/WordCountJob.scala (renamed from scald-mvp/src/main/scala/example/WordCountJob.scala) | 0 | ||||
-rw-r--r-- | scalding/src/main/scala/sandcrawler/HBaseRowCountJob.scala (renamed from scald-mvp/src/main/scala/sandcrawler/HBaseRowCountJob.scala) | 0 | ||||
-rw-r--r-- | scalding/src/test/scala/example/SimpleHBaseSourceExampleTest.scala (renamed from scald-mvp/src/test/scala/example/SimpleHBaseSourceExampleTest.scala) | 0 | ||||
-rw-r--r-- | scalding/src/test/scala/example/WordCountTest.scala (renamed from scald-mvp/src/test/scala/example/WordCountTest.scala) | 0 | ||||
-rw-r--r-- | scalding/src/test/scala/sandcrawler/HBaseRowCountTest.scala (renamed from scald-mvp/src/test/scala/sandcrawler/HBaseRowCountTest.scala) | 0 |
14 files changed, 0 insertions, 71 deletions
diff --git a/jvm-mapreduce/TODO b/jvm-mapreduce/TODO deleted file mode 100644 index 46b3b15..0000000 --- a/jvm-mapreduce/TODO +++ /dev/null @@ -1,16 +0,0 @@ - -Libraries: -- sbt? or gradle? (build tool) - => debian packages: https://www.scala-sbt.org/download.html - (or just a single .deb...) -- scalding (mapreduce framework) -- scala (java also fine?) - => will scala work with java 1.7? - => scala 2.11 (~2014) works with java 7; scala 2.12 and up require 8 - => debian stretch: scala 2.11.8-1 - => ubuntu xenial: scala/xenial 2.11.6-6 - => "Scalding works with Scala 2.10 and 2.11 is recommended" -- testing -- hbase connector library - => maybe spyglass? -- hbase mock diff --git a/jvm-mapreduce/learning.txt b/jvm-mapreduce/learning.txt deleted file mode 100644 index 6fe1442..0000000 --- a/jvm-mapreduce/learning.txt +++ /dev/null @@ -1,55 +0,0 @@ - -## proof of concept on hadoop: - -This seemed to work: - - yarn jar tutorial/execution-tutorial/target/scala-2.11/execution-tutorial-assembly-0.18.0-SNAPSHOT.jar Tutorial1 --hdfs --input test_cdx --output test_scalding_out1 - -Or, with actual files on hadoop: - - yarn jar tutorial/execution-tutorial/target/scala-2.11/execution-tutorial-assembly-0.18.0-SNAPSHOT.jar Tutorial1 --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out2 - -Horray! One issue with this was that building scalding took *forever* (meaning -30+ minutes). - -potentially instead: - - hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool main.scala.example.WordCountJob --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out2 - -Hypothesis: class name should be same as file name. Don't need `main` function -if using Scalding Tool wrapper jar. Don't need scald.rb. - - hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool example.WordCount --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out2 - -## sbt - -Uncommenting this line in scalding:build.sbt sped things way up (don't need to -run *all* the tests): - - // Uncomment if you don't want to run all the tests before building assembly - // test in assembly := {}, - -Also get the following error (in a different context): - - bnewbold@orithena$ sbt new typesafehub/scala-sbt - [info] Loading project definition from /home/bnewbold/src/scala-sbt.g8/project/project - [info] Compiling 1 Scala source to /home/bnewbold/src/scala-sbt.g8/project/project/target/scala-2.9.1/sbt-0.11.2/classes... - [error] error while loading CharSequence, class file '/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken - [error] (bad constant pool tag 18 at byte 10) - [error] one error found - [error] {file:/home/bnewbold/src/scala-sbt.g8/project/project/}default-46da7b/compile:compile: Compilation failed - Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? - -## resources/tutorials - -Whole bunch of example commands (sbt, maven, gradle) to build scalding: - - https://medium.com/@gayani.nan/how-to-run-a-scalding-job-567160fa193 - -Also looks good: - - https://blog.matthewrathbone.com/2015/10/20/scalding-tutorial.html - -Possibly related: - - http://sujitpal.blogspot.com/2012/08/scalding-for-impatient.html diff --git a/scald-mvp/.gitignore b/scalding/.gitignore index 7798ee0..7798ee0 100644 --- a/scald-mvp/.gitignore +++ b/scalding/.gitignore diff --git a/scald-mvp/README.md b/scalding/README.md index e41e9ec..e41e9ec 100644 --- a/scald-mvp/README.md +++ b/scalding/README.md diff --git a/scald-mvp/build.sbt b/scalding/build.sbt index aae8506..aae8506 100644 --- a/scald-mvp/build.sbt +++ b/scalding/build.sbt diff --git a/scald-mvp/project/Dependencies.scala b/scalding/project/Dependencies.scala index 558929d..558929d 100644 --- a/scald-mvp/project/Dependencies.scala +++ b/scalding/project/Dependencies.scala diff --git a/scald-mvp/project/build.properties b/scalding/project/build.properties index 31334bb..31334bb 100644 --- a/scald-mvp/project/build.properties +++ b/scalding/project/build.properties diff --git a/scald-mvp/project/plugins.sbt b/scalding/project/plugins.sbt index 084d4bf..084d4bf 100644 --- a/scald-mvp/project/plugins.sbt +++ b/scalding/project/plugins.sbt diff --git a/scald-mvp/src/main/scala/example/SimpleHBaseSourceExample.scala b/scalding/src/main/scala/example/SimpleHBaseSourceExample.scala index fe2a120..fe2a120 100644 --- a/scald-mvp/src/main/scala/example/SimpleHBaseSourceExample.scala +++ b/scalding/src/main/scala/example/SimpleHBaseSourceExample.scala diff --git a/scald-mvp/src/main/scala/example/WordCountJob.scala b/scalding/src/main/scala/example/WordCountJob.scala index 0e63fed..0e63fed 100644 --- a/scald-mvp/src/main/scala/example/WordCountJob.scala +++ b/scalding/src/main/scala/example/WordCountJob.scala diff --git a/scald-mvp/src/main/scala/sandcrawler/HBaseRowCountJob.scala b/scalding/src/main/scala/sandcrawler/HBaseRowCountJob.scala index 5df6b2e..5df6b2e 100644 --- a/scald-mvp/src/main/scala/sandcrawler/HBaseRowCountJob.scala +++ b/scalding/src/main/scala/sandcrawler/HBaseRowCountJob.scala diff --git a/scald-mvp/src/test/scala/example/SimpleHBaseSourceExampleTest.scala b/scalding/src/test/scala/example/SimpleHBaseSourceExampleTest.scala index cf068c1..cf068c1 100644 --- a/scald-mvp/src/test/scala/example/SimpleHBaseSourceExampleTest.scala +++ b/scalding/src/test/scala/example/SimpleHBaseSourceExampleTest.scala diff --git a/scald-mvp/src/test/scala/example/WordCountTest.scala b/scalding/src/test/scala/example/WordCountTest.scala index c42770f..c42770f 100644 --- a/scald-mvp/src/test/scala/example/WordCountTest.scala +++ b/scalding/src/test/scala/example/WordCountTest.scala diff --git a/scald-mvp/src/test/scala/sandcrawler/HBaseRowCountTest.scala b/scalding/src/test/scala/sandcrawler/HBaseRowCountTest.scala index 598f45d..598f45d 100644 --- a/scald-mvp/src/test/scala/sandcrawler/HBaseRowCountTest.scala +++ b/scalding/src/test/scala/sandcrawler/HBaseRowCountTest.scala |