aboutsummaryrefslogtreecommitdiffstats
path: root/extraction/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-04-03 16:30:19 -0700
committerBryan Newbold <bnewbold@archive.org>2018-04-03 16:30:19 -0700
commit5ea0ef0fd34f09fab51f51e2f1dbe5cf6ec137cc (patch)
tree6198a1c156d6922c02f12964d5e9a30d71677dcc /extraction/README.md
parentf10e8b798c8611d316279aa0afb158ee14691236 (diff)
downloadsandcrawler-5ea0ef0fd34f09fab51f51e2f1dbe5cf6ec137cc.tar.gz
sandcrawler-5ea0ef0fd34f09fab51f51e2f1dbe5cf6ec137cc.zip
WIP on extractor-with-mrjob
Diffstat (limited to 'extraction/README.md')
-rw-r--r--extraction/README.md19
1 files changed, 19 insertions, 0 deletions
diff --git a/extraction/README.md b/extraction/README.md
new file mode 100644
index 0000000..1da1454
--- /dev/null
+++ b/extraction/README.md
@@ -0,0 +1,19 @@
+
+
+## Development and Testing
+
+Requires (eg, via `apt`):
+
+- libjpeg-dev
+
+Install pipenv system-wide if you don't have it:
+
+ # or use apt, homebrew, etc
+ sudo pip3 install pipenv
+
+Run the tests with:
+
+ pipenv run pytest
+
+TODO: GROBID and HBase during development?
+