From b8cf9f6ea726970775ea49a44b243ad158d14a7c Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 4 Apr 2018 13:31:59 -0700 Subject: README/TODO updates --- README.md | 5 +++++ TODO | 7 ++++++- mapreduce/README.md | 17 +++++++++-------- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index fc5d9f6..f73eee2 100644 --- a/README.md +++ b/README.md @@ -22,3 +22,8 @@ this, and python in general: # libjpeg-dev is for some wayback/pillow stuff sudo apt install python3-dev python3-pip python3-wheel libjpeg-dev pip3 install --user pipenv + +Each directory has it's own environment. Do something like: + + pipenv install --dev + pipenv shell diff --git a/TODO b/TODO index a1cf98d..b518037 100644 --- a/TODO +++ b/TODO @@ -1,5 +1,10 @@ -https://github.com/getsentry/raven-python +Will probably eventually refactor into top-level plus modules. Eg, "common" +directory, "backfill" and "extraction" as sub-directories. Downside of this is +single giant pipenv venv with all dependencies? + +sentry: +- https://github.com/getsentry/raven-python potential helpers: - https://github.com/martinblech/xmltodict diff --git a/mapreduce/README.md b/mapreduce/README.md index b063fba..3cff9f1 100644 --- a/mapreduce/README.md +++ b/mapreduce/README.md @@ -1,14 +1,11 @@ -## Development and Testing - -Requires (eg, via `apt`): +Hadoop streaming map/reduce jobs written in python using the mrjob library. -- libjpeg-dev +## Development and Testing -Install pipenv system-wide if you don't have it: +System dependencies in addition to `../README.md`: - # or use apt, homebrew, etc - sudo pip3 install pipenv +- `libjpeg-dev` (for wayback libraries) Run the tests with: @@ -16,7 +13,11 @@ Run the tests with: TODO: GROBID and HBase during development? -## Backfill +## Extraction Task + +TODO: + +## Backfill Task An example actually connecting to HBase from a local machine, with thrift running on a devbox: -- cgit v1.2.3