aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 1a251eb46ce8f0d95aff0f3c92109b70d922beb0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

                                     _                         _           
    _________    ___  __ _ _ __   __| | ___ _ __ __ ___      _| | ___ _ __ 
    \        |  / __|/ _` | '_ \ / _` |/ __| '__/ _` \ \ /\ / / |/ _ \ '__|
     \       |  \__ \ (_| | | | | (_| | (__| | | (_| |\ V  V /| |  __/ |   
      \@@@@@@|  |___/\__,_|_| |_|\__,_|\___|_|  \__,_| \_/\_/ |_|\___|_|   


This repo contains hadoop tasks (mapreduce and pig), luigi jobs, and other
scripts and code for the internet archive (web group) journal ingest pipeline.

This repository is potentially public.

Archive-specific deployment/production guides and ansible scripts at:
[journal-infra](https://git.archive.org/bnewbold/journal-infra)