diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-08-24 13:42:23 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-08-24 13:42:23 -0700 |
commit | 57dd8de831a6bc6b7de01846b6788163e9453784 (patch) | |
tree | 0e75aa243bff1cd8493fddf87e497eb0f78cb8ef | |
parent | 67755e366bcc1df455a9d75710a11030c3e2cc52 (diff) | |
download | sandcrawler-57dd8de831a6bc6b7de01846b6788163e9453784.tar.gz sandcrawler-57dd8de831a6bc6b7de01846b6788163e9453784.zip |
update TODO
-rw-r--r-- | TODO | 19 | ||||
-rw-r--r-- | notes/library_shopping.txt | 10 |
2 files changed, 16 insertions, 13 deletions
@@ -1,21 +1,14 @@ +scalding: +- less verbose sbt test output (set log level to WARN) +- auto-formatting: addSbtPlugin("com.geirsson" % "sbt-scalafmt" % "1.6.0-RC3") + pig: - potentially want to *not* de-dupe CDX lines by uniq sha1 in all cases; run this as a second-stage filter? for example, may want many URL links in fatcat for a single file (different links, different policies) +- fix pig gitlab-ci tests (JAVA_HOME) +python: - include input file name (and chunk? and CDX?) in sentry context -- play with test image on older releases (eg, trusty) - - how to get argument (like --hbase-table) into mrjob.conf, or similar? -- fix pig gitlab-ci tests (JAVA_HOME) - -potential helpers: -- https://github.com/martinblech/xmltodict -- https://github.com/trananhkma/fucking-awesome-python#text-processing -- https://github.com/blaze/blaze (for catalog/analytics) -- validation: https://github.com/pyeve/cerberus -- testing (to replace nose): - - https://github.com/CleanCut/green - - pytest - - mamba ("behavior driven") diff --git a/notes/library_shopping.txt b/notes/library_shopping.txt new file mode 100644 index 0000000..bf876a5 --- /dev/null +++ b/notes/library_shopping.txt @@ -0,0 +1,10 @@ + +potential helpers: +- https://github.com/martinblech/xmltodict +- https://github.com/trananhkma/fucking-awesome-python#text-processing +- https://github.com/blaze/blaze (for catalog/analytics) +- validation: https://github.com/pyeve/cerberus +- testing (to replace nose): + - https://github.com/CleanCut/green + - pytest + - mamba ("behavior driven") |