aboutsummaryrefslogtreecommitdiffstats
path: root/extraction/TODO
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-04-04 11:55:22 -0700
committerBryan Newbold <bnewbold@archive.org>2018-04-04 11:55:22 -0700
commit427dd875958c8a6d2d791d55f9dda300ebdc853b (patch)
treedacb8febb89ed0694d41db03d86c38e3374aa844 /extraction/TODO
parent78caa0d7772375903194e79df16d70d831ebd432 (diff)
downloadsandcrawler-427dd875958c8a6d2d791d55f9dda300ebdc853b.tar.gz
sandcrawler-427dd875958c8a6d2d791d55f9dda300ebdc853b.zip
merge backfill into extraction directory
Diffstat (limited to 'extraction/TODO')
-rw-r--r--extraction/TODO4
1 files changed, 4 insertions, 0 deletions
diff --git a/extraction/TODO b/extraction/TODO
index ed10834..3459752 100644
--- a/extraction/TODO
+++ b/extraction/TODO
@@ -1,2 +1,6 @@
+- better test coverage (actually check coverage!)
+- use pre-mapper command to filter down, eg, by status type?
+- automation/docs for bundling virtualenv along
+- think about speedups
- abstract CDX line reading and HBase stuff out into a common library
- actual GROBID_SERVER="http://wbgrp-svc096.us.archive.org:8070"