aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-03-29 11:59:28 -0700
committerBryan Newbold <bnewbold@archive.org>2018-03-29 11:59:28 -0700
commit63186a218e2e10848d4b014eacbc4ad3a51a20ca (patch)
tree85222d54fa0420bc4df2bb4c5c7b65eefffb5a26 /README.md
downloadsandcrawler-63186a218e2e10848d4b014eacbc4ad3a51a20ca.tar.gz
sandcrawler-63186a218e2e10848d4b014eacbc4ad3a51a20ca.zip
init repo
Diffstat (limited to 'README.md')
-rw-r--r--README.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6ea387f
--- /dev/null
+++ b/README.md
@@ -0,0 +1,9 @@
+
+This repo contains hadoop tasks (mapreduce and pig), luigi jobs, and other
+scripts and code for the journal ingest pipeline.
+
+This repository is potentially public. Maybe we'll rename it "sandcrawler"?
+
+Archive-specific deployment/production guides and ansible scripts at:
+[journal-infra](https://git.archive.org/bnewbold/journal-infra)
+