diff options
author | Bryan Newbold <bnewbold@archive.org> | 2019-01-03 14:01:21 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2019-01-03 14:01:21 -0800 |
commit | 8901138485d1da4eb9a2512268faaa27fdf567c5 (patch) | |
tree | 878f115e93bed118470531730ba02c9a4d5d9627 | |
parent | 75c4aa99448141ccb5f36528d3673e84f954e646 (diff) | |
download | sandcrawler-8901138485d1da4eb9a2512268faaa27fdf567c5.tar.gz sandcrawler-8901138485d1da4eb9a2512268faaa27fdf567c5.zip |
update (internal) journal-infra link
-rw-r--r-- | README.md | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -12,7 +12,7 @@ internet archive web group's journal ingest pipeline. Code in tihs repository is potentially public! Archive-specific deployment/production guides and ansible scripts at: -[journal-infra](https://git.archive.org/bnewbold/journal-infra) +[journal-infra](https://git.archive.org/webgroup/journal-infra) **./python/** contains Hadoop streaming jobs written in python using the `mrjob` library. Most notably, the **extraction** scripts, which fetch PDF |