aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--README.md5
-rw-r--r--TODO7
-rw-r--r--mapreduce/README.md17
3 files changed, 20 insertions, 9 deletions
diff --git a/README.md b/README.md
index fc5d9f6..f73eee2 100644
--- a/README.md
+++ b/README.md
@@ -22,3 +22,8 @@ this, and python in general:
# libjpeg-dev is for some wayback/pillow stuff
sudo apt install python3-dev python3-pip python3-wheel libjpeg-dev
pip3 install --user pipenv
+
+Each directory has it's own environment. Do something like:
+
+ pipenv install --dev
+ pipenv shell
diff --git a/TODO b/TODO
index a1cf98d..b518037 100644
--- a/TODO
+++ b/TODO
@@ -1,5 +1,10 @@
-https://github.com/getsentry/raven-python
+Will probably eventually refactor into top-level plus modules. Eg, "common"
+directory, "backfill" and "extraction" as sub-directories. Downside of this is
+single giant pipenv venv with all dependencies?
+
+sentry:
+- https://github.com/getsentry/raven-python
potential helpers:
- https://github.com/martinblech/xmltodict
diff --git a/mapreduce/README.md b/mapreduce/README.md
index b063fba..3cff9f1 100644
--- a/mapreduce/README.md
+++ b/mapreduce/README.md
@@ -1,14 +1,11 @@
-## Development and Testing
-
-Requires (eg, via `apt`):
+Hadoop streaming map/reduce jobs written in python using the mrjob library.
-- libjpeg-dev
+## Development and Testing
-Install pipenv system-wide if you don't have it:
+System dependencies in addition to `../README.md`:
- # or use apt, homebrew, etc
- sudo pip3 install pipenv
+- `libjpeg-dev` (for wayback libraries)
Run the tests with:
@@ -16,7 +13,11 @@ Run the tests with:
TODO: GROBID and HBase during development?
-## Backfill
+## Extraction Task
+
+TODO:
+
+## Backfill Task
An example actually connecting to HBase from a local machine, with thrift
running on a devbox: