aboutsummaryrefslogtreecommitdiffstats
path: root/mapreduce
diff options
context:
space:
mode:
Diffstat (limited to 'mapreduce')
-rw-r--r--mapreduce/README.md17
1 files changed, 9 insertions, 8 deletions
diff --git a/mapreduce/README.md b/mapreduce/README.md
index b063fba..3cff9f1 100644
--- a/mapreduce/README.md
+++ b/mapreduce/README.md
@@ -1,14 +1,11 @@
-## Development and Testing
-
-Requires (eg, via `apt`):
+Hadoop streaming map/reduce jobs written in python using the mrjob library.
-- libjpeg-dev
+## Development and Testing
-Install pipenv system-wide if you don't have it:
+System dependencies in addition to `../README.md`:
- # or use apt, homebrew, etc
- sudo pip3 install pipenv
+- `libjpeg-dev` (for wayback libraries)
Run the tests with:
@@ -16,7 +13,11 @@ Run the tests with:
TODO: GROBID and HBase during development?
-## Backfill
+## Extraction Task
+
+TODO:
+
+## Backfill Task
An example actually connecting to HBase from a local machine, with thrift
running on a devbox: