From 1dad0d9e54bfae93eebea47f8a3cb291cdd645c5 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 4 Apr 2018 12:06:38 -0700 Subject: extraction -> mapreduce --- mapreduce/TODO | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 mapreduce/TODO (limited to 'mapreduce/TODO') diff --git a/mapreduce/TODO b/mapreduce/TODO new file mode 100644 index 0000000..3459752 --- /dev/null +++ b/mapreduce/TODO @@ -0,0 +1,6 @@ +- better test coverage (actually check coverage!) +- use pre-mapper command to filter down, eg, by status type? +- automation/docs for bundling virtualenv along +- think about speedups +- abstract CDX line reading and HBase stuff out into a common library +- actual GROBID_SERVER="http://wbgrp-svc096.us.archive.org:8070" -- cgit v1.2.3