diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-03-29 16:04:44 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-03-29 16:04:44 -0700 |
commit | d2203182c9ed6e1ff13fa70fb25f049ef87c75a0 (patch) | |
tree | a888f769b6580f84225d7e7b3e88effe4e982acd | |
parent | e336b389b489bcefd601eba631c395d8a37d5ab3 (diff) | |
download | sandcrawler-d2203182c9ed6e1ff13fa70fb25f049ef87c75a0.tar.gz sandcrawler-d2203182c9ed6e1ff13fa70fb25f049ef87c75a0.zip |
sandcrawler
-rw-r--r-- | README.md | 11 |
1 files changed, 9 insertions, 2 deletions
@@ -1,8 +1,15 @@ + _ _ + _________ ___ __ _ _ __ __| | ___ _ __ __ ___ _| | ___ _ __ + \ | / __|/ _` | '_ \ / _` |/ __| '__/ _` \ \ /\ / / |/ _ \ '__| + \ | \__ \ (_| | | | | (_| | (__| | | (_| |\ V V /| | __/ | + \@@@@@@| |___/\__,_|_| |_|\__,_|\___|_| \__,_| \_/\_/ |_|\___|_| + + This repo contains hadoop tasks (mapreduce and pig), luigi jobs, and other -scripts and code for the journal ingest pipeline. +scripts and code for the internet archive (web group) journal ingest pipeline. -This repository is potentially public. Maybe we'll rename it "sandcrawler"? +This repository is potentially public. Archive-specific deployment/production guides and ansible scripts at: [journal-infra](https://git.archive.org/bnewbold/journal-infra) |