aboutsummaryrefslogtreecommitdiffstats
path: root/TODO
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-08-24 13:42:23 -0700
committerBryan Newbold <bnewbold@archive.org>2018-08-24 13:42:23 -0700
commit57dd8de831a6bc6b7de01846b6788163e9453784 (patch)
tree0e75aa243bff1cd8493fddf87e497eb0f78cb8ef /TODO
parent67755e366bcc1df455a9d75710a11030c3e2cc52 (diff)
downloadsandcrawler-57dd8de831a6bc6b7de01846b6788163e9453784.tar.gz
sandcrawler-57dd8de831a6bc6b7de01846b6788163e9453784.zip
update TODO
Diffstat (limited to 'TODO')
-rw-r--r--TODO19
1 files changed, 6 insertions, 13 deletions
diff --git a/TODO b/TODO
index 821bd0e..5363428 100644
--- a/TODO
+++ b/TODO
@@ -1,21 +1,14 @@
+scalding:
+- less verbose sbt test output (set log level to WARN)
+- auto-formatting: addSbtPlugin("com.geirsson" % "sbt-scalafmt" % "1.6.0-RC3")
+
pig:
- potentially want to *not* de-dupe CDX lines by uniq sha1 in all cases; run
this as a second-stage filter? for example, may want many URL links in fatcat
for a single file (different links, different policies)
+- fix pig gitlab-ci tests (JAVA_HOME)
+python:
- include input file name (and chunk? and CDX?) in sentry context
-- play with test image on older releases (eg, trusty)
-
- how to get argument (like --hbase-table) into mrjob.conf, or similar?
-- fix pig gitlab-ci tests (JAVA_HOME)
-
-potential helpers:
-- https://github.com/martinblech/xmltodict
-- https://github.com/trananhkma/fucking-awesome-python#text-processing
-- https://github.com/blaze/blaze (for catalog/analytics)
-- validation: https://github.com/pyeve/cerberus
-- testing (to replace nose):
- - https://github.com/CleanCut/green
- - pytest
- - mamba ("behavior driven")