- "author counts match" in scoring - refactor "scorable" to "matchable" - look at refactoring to reduce JSON serializations scalding: - better JSON library - less verbose sbt test output (set log level to WARN) - auto-formatting: addSbtPlugin("com.geirsson" % "sbt-scalafmt" % "1.6.0-RC3") pig: - potentially want to *not* de-dupe CDX lines by uniq sha1 in all cases; run this as a second-stage filter? for example, may want many URL links in fatcat for a single file (different links, different policies) - fix pig gitlab-ci tests (JAVA_HOME) python: - include input file name (and chunk? and CDX?) in sentry context - how to get argument (like --hbase-table) into mrjob.conf, or similar?