blob: e41e9eca98160373a384210aa1cc4c535931d88a (
plain)
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
 | 
following https://medium.com/@gayani.nan/how-to-run-a-scalding-job-567160fa193
running on my laptop:
    openjdk version "1.8.0_171"
    OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-1~deb9u1-b11)
    OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
    Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
    sbt: 1.1.5
    sbt new scala/scala-seed.g8
    # inserted additional deps, tweaked versions
    # hadoop 2.5.0 seems to conflict with cascading; sticking with 2.6.0
    sbt assembly
    scp target/scala-2.11/scald-mvp-assembly-0.1.0-SNAPSHOT.jar devbox:
    # on cluster:
    yarn jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar WordCount --hdfs --input hdfs:///user/bnewbold/dummy.txt
later, using hadop command instead:
    hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool example.WordCountJob --hdfs --input hdfs:///user/bnewbold/dummy.txt --output hdfs:///user/bnewbold/test_scalding_out3
helpful for debugging dependency woes:
    sbt dependencyTree
testing the spyglass example program (expect a table error):
    hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool example.SimpleHBaseSourceExample --hdfs --output hdfs:///user/bnewbold/spyglass_out_test --app.conf.path thing.conf --debug true
    # org.apache.hadoop.hbase.TableNotFoundException: table_name
running a spyglass job (gives a nullpointer exception):
    hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool sandcrawler.HBaseRowCountJob --hdfs --output hdfs:///user/bnewbold/spyglass_out_test --app.conf.path thing.conf
    # Caused by: java.lang.NullPointerException
    #         at parallelai.spyglass.hbase.HBaseSource.<init>(HBaseSource.scala:48)
    #         at sandcrawler.HBaseRowCountJob.<init>(HBaseRowCountJob.scala:17)
## Custom build
in SpyGlass repo:
    # This builds the new .jar and installs it in the (laptop local) ~/.m2
    # repository
    mvn clean install -U
    # Copy that .jar (and associated pom.xml) over to where sbt can find it
    mkdir -p ~/.sbt/preloaded/parallelai/
    cp -r ~/.m2/repository/parallelai/parallelai.spyglass ~/.sbt/preloaded/parallelai/
    # then build here
    sbt assembly
The medium-term plan here is to push the custom SpyGlass jar as a static maven
repo to an archive.org item, and point build.sbt to that folder.
 |