aboutsummaryrefslogtreecommitdiffstats
path: root/TODO
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-05-08 10:06:14 -0700
committerBryan Newbold <bnewbold@archive.org>2018-05-08 10:06:20 -0700
commit18a55d37a87d4391bd8161201c523dd7d7f0f1e7 (patch)
tree86db4c84cf4fd0dde5ea9508617344018e640104 /TODO
parent1831a3b4495aee275e4b4b187fa545eba75eb87b (diff)
downloadsandcrawler-18a55d37a87d4391bd8161201c523dd7d7f0f1e7.tar.gz
sandcrawler-18a55d37a87d4391bd8161201c523dd7d7f0f1e7.zip
fix tests post-DISTINCT
Confirms it's working!
Diffstat (limited to 'TODO')
-rw-r--r--TODO5
1 files changed, 5 insertions, 0 deletions
diff --git a/TODO b/TODO
index 57c827f..5e9220b 100644
--- a/TODO
+++ b/TODO
@@ -1,4 +1,9 @@
+pig:
+- potentially want to *not* de-dupe CDX lines by uniq sha1 in all cases; run
+ this as a second-stage filter? for example, may want many URL links in fatcat
+ for a single file (different links, different policies)
+
- include input file name (and chunk? and CDX?) in sentry context
- play with test image on older releases (eg, trusty)