diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-05-23 12:27:59 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-05-24 00:02:36 -0700 |
commit | 4ba428db30593b67283dd90b92141f99840dc78e (patch) | |
tree | f63c8e146e7f90a530abfebdb993ab45d57426d5 /jvm-mapreduce | |
parent | 29e4a83ff76da07bc6ad5d3f49d746ee0bc72023 (diff) | |
download | sandcrawler-4ba428db30593b67283dd90b92141f99840dc78e.tar.gz sandcrawler-4ba428db30593b67283dd90b92141f99840dc78e.zip |
rename jvm/scalding directories
Diffstat (limited to 'jvm-mapreduce')
-rw-r--r-- | jvm-mapreduce/TODO | 16 | ||||
-rw-r--r-- | jvm-mapreduce/learning.txt | 55 |
2 files changed, 0 insertions, 71 deletions
diff --git a/jvm-mapreduce/TODO b/jvm-mapreduce/TODO deleted file mode 100644 index 46b3b15..0000000 --- a/jvm-mapreduce/TODO +++ /dev/null @@ -1,16 +0,0 @@ - -Libraries: -- sbt? or gradle? (build tool) - => debian packages: https://www.scala-sbt.org/download.html - (or just a single .deb...) -- scalding (mapreduce framework) -- scala (java also fine?) - => will scala work with java 1.7? - => scala 2.11 (~2014) works with java 7; scala 2.12 and up require 8 - => debian stretch: scala 2.11.8-1 - => ubuntu xenial: scala/xenial 2.11.6-6 - => "Scalding works with Scala 2.10 and 2.11 is recommended" -- testing -- hbase connector library - => maybe spyglass? -- hbase mock diff --git a/jvm-mapreduce/learning.txt b/jvm-mapreduce/learning.txt deleted file mode 100644 index 6fe1442..0000000 --- a/jvm-mapreduce/learning.txt +++ /dev/null @@ -1,55 +0,0 @@ - -## proof of concept on hadoop: - -This seemed to work: - - yarn jar tutorial/execution-tutorial/target/scala-2.11/execution-tutorial-assembly-0.18.0-SNAPSHOT.jar Tutorial1 --hdfs --input test_cdx --output test_scalding_out1 - -Or, with actual files on hadoop: - - yarn jar tutorial/execution-tutorial/target/scala-2.11/execution-tutorial-assembly-0.18.0-SNAPSHOT.jar Tutorial1 --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out2 - -Horray! One issue with this was that building scalding took *forever* (meaning -30+ minutes). - -potentially instead: - - hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool main.scala.example.WordCountJob --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out2 - -Hypothesis: class name should be same as file name. Don't need `main` function -if using Scalding Tool wrapper jar. Don't need scald.rb. - - hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool example.WordCount --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out2 - -## sbt - -Uncommenting this line in scalding:build.sbt sped things way up (don't need to -run *all* the tests): - - // Uncomment if you don't want to run all the tests before building assembly - // test in assembly := {}, - -Also get the following error (in a different context): - - bnewbold@orithena$ sbt new typesafehub/scala-sbt - [info] Loading project definition from /home/bnewbold/src/scala-sbt.g8/project/project - [info] Compiling 1 Scala source to /home/bnewbold/src/scala-sbt.g8/project/project/target/scala-2.9.1/sbt-0.11.2/classes... - [error] error while loading CharSequence, class file '/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken - [error] (bad constant pool tag 18 at byte 10) - [error] one error found - [error] {file:/home/bnewbold/src/scala-sbt.g8/project/project/}default-46da7b/compile:compile: Compilation failed - Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? - -## resources/tutorials - -Whole bunch of example commands (sbt, maven, gradle) to build scalding: - - https://medium.com/@gayani.nan/how-to-run-a-scalding-job-567160fa193 - -Also looks good: - - https://blog.matthewrathbone.com/2015/10/20/scalding-tutorial.html - -Possibly related: - - http://sujitpal.blogspot.com/2012/08/scalding-for-impatient.html |