1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
|
Quick tips for debugging scalding issues...
## Dependencies
Print the dependency graph (using the `sbt-dependency-graph` plugin):
sbt dependencyTree
## Old Errors
At one phase, was getting `NullPointerException` errors when running tests or
in production, like:
bnewbold@bnewbold-dev$ hadoop jar scald-mvp-assembly-0.1.0-SNAPSHOT.jar com.twitter.scalding.Tool sandcrawler.HBaseRowCountJob --hdfs --output hdfs:///user/bnewbold/spyglass_out_test
Exception in thread "main" java.lang.Throwable: If you know what exactly caused this error, please consider contributing to GitHub via following link.
https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#javalangnullpointerexception
at com.twitter.scalding.Tool$.main(Tool.scala:152)
at com.twitter.scalding.Tool.main(Tool.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.twitter.scalding.Job$.apply(Job.scala:44)
at com.twitter.scalding.Tool.getJob(Tool.scala:49)
at com.twitter.scalding.Tool.run(Tool.scala:68)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.twitter.scalding.Tool$.main(Tool.scala:148)
... 6 more
Caused by: java.lang.NullPointerException
at parallelai.spyglass.hbase.HBaseSource.<init>(HBaseSource.scala:48)
at sandcrawler.HBaseRowCountJob.<init>(HBaseRowCountJob.scala:14)
... 15 more
This was resolved by ensuring that all required parameters were being passed to
the `HBaseSource` constructor.
Another time, saw a bunch of `None.get` errors when running tests. These were
resolved by ensuring that the `HBaseSource` constructors had exactly identical
names and arguments (eg, table names and zookeeper quorums have to be exact
matches).
If you get:
value toTypedPipe is not a member of cascading.pipe.Pipe
You probably need to [import some types][tdsl] from:
import com.twitter.scalding.typed.TDsl._
[tdsl]: https://github.com/twitter/scalding/wiki/Type-safe-api-reference#interoperating-between-fields-api-and-type-safe-api
## Running Individual Tests
You can run a single test matching a string glob pattern like:
sbt:sandcrawler> testOnly *CdxBackfill*
## Fields
Values of type `List[Fields]` are not printed in the expected way:
$ scala -cp scala -cp ~/.m2/repository/cascading/cascading-core/2.6.1/cascading-core-2.6.1.jar
Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31).
Type in expressions for evaluation. Or try :help.
scala> import cascading.tuple.Fields
import cascading.tuple.Fields
scala> val fields1 = new Fields("a", "b")
fields1: cascading.tuple.Fields = 'a', 'b'
scala> val fields2 = new Fields("c")
fields2: cascading.tuple.Fields = 'c'
scala> val allFields = List(fields1, fields2)
allFields: List[cascading.tuple.Fields] = List('a', 'b', 'c')
scala> allFields.length
res0: Int = 2
## SpyGlass Column Selection
Two equivalent ways to specify `columns`/`column_families`:
List("f", "file"),
List(new Fields("c"), new Fields("size", "mimetype")),
List("f", "file", "file")
List(new Fields("c"), new Fields("size"), new Fields("mimetype")),
|