aboutsummaryrefslogtreecommitdiffstats
path: root/mapreduce/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'mapreduce/TODO')
-rw-r--r--mapreduce/TODO6
1 files changed, 6 insertions, 0 deletions
diff --git a/mapreduce/TODO b/mapreduce/TODO
new file mode 100644
index 0000000..3459752
--- /dev/null
+++ b/mapreduce/TODO
@@ -0,0 +1,6 @@
+- better test coverage (actually check coverage!)
+- use pre-mapper command to filter down, eg, by status type?
+- automation/docs for bundling virtualenv along
+- think about speedups
+- abstract CDX line reading and HBase stuff out into a common library
+- actual GROBID_SERVER="http://wbgrp-svc096.us.archive.org:8070"