From 427dd875958c8a6d2d791d55f9dda300ebdc853b Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 4 Apr 2018 11:55:22 -0700 Subject: merge backfill into extraction directory --- extraction/TODO | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'extraction/TODO') diff --git a/extraction/TODO b/extraction/TODO index ed10834..3459752 100644 --- a/extraction/TODO +++ b/extraction/TODO @@ -1,2 +1,6 @@ +- better test coverage (actually check coverage!) +- use pre-mapper command to filter down, eg, by status type? +- automation/docs for bundling virtualenv along +- think about speedups - abstract CDX line reading and HBase stuff out into a common library - actual GROBID_SERVER="http://wbgrp-svc096.us.archive.org:8070" -- cgit v1.2.3