aboutsummaryrefslogtreecommitdiffstats
path: root/scalding/scalding-debugging.md
blob: 2e29fced35860986368b5c1cc79888d869f49298 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

Quick tips for debugging scalding issues...

## Dependencies

Print the dependency graph (using the `sbt-dependency-graph` plugin):

    sbt dependencyTree

## Old Errors

At one phase, was getting `NullPointerException` errors when running tests or
in production, like:

    bnewbold@bnewbold-dev$ hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool sandcrawler.HBaseRowCountJob --hdfs --output hdfs:///user/bnewbold/spyglass_out_test
    Exception in thread "main" java.lang.Throwable: If you know what exactly caused this error, please consider contributing to GitHub via following link.
    https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#javalangnullpointerexception
            at com.twitter.scalding.Tool$.main(Tool.scala:152)
            at com.twitter.scalding.Tool.main(Tool.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    Caused by: java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
            at com.twitter.scalding.Job$.apply(Job.scala:44)
            at com.twitter.scalding.Tool.getJob(Tool.scala:49)
            at com.twitter.scalding.Tool.run(Tool.scala:68)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
            at com.twitter.scalding.Tool$.main(Tool.scala:148)
            ... 6 more
    Caused by: java.lang.NullPointerException
            at parallelai.spyglass.hbase.HBaseSource.<init>(HBaseSource.scala:48)
            at sandcrawler.HBaseRowCountJob.<init>(HBaseRowCountJob.scala:14)
            ... 15 more

This was resolved by ensuring that all required parameters were being passed to
the `HBaseSource` constructor.

Another time, saw a bunch of `None.get` errors when running tests. These were
resolved by ensuring that the `HBaseSource` constructors had exactly identical
names and arguments (eg, table names and zookeeper quorums have to be exact
matches).