aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2018-03-29 11:59:28 -0700
committerBryan Newbold <bnewbold@archive.org>2018-03-29 11:59:28 -0700
commit63186a218e2e10848d4b014eacbc4ad3a51a20ca (patch)
tree85222d54fa0420bc4df2bb4c5c7b65eefffb5a26
downloadsandcrawler-63186a218e2e10848d4b014eacbc4ad3a51a20ca.tar.gz
sandcrawler-63186a218e2e10848d4b014eacbc4ad3a51a20ca.zip
init repo
-rw-r--r--.gitignore21
-rw-r--r--README.md9
2 files changed, 30 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..81a4762
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,21 @@
+*.o
+*.a
+*.pyc
+#*#
+*~
+*.swp
+.*
+*.tmp
+*.old
+*.profile
+*.bkp
+*.bak
+[Tt]humbs.db
+*.DS_Store
+build/
+_build/
+src/build/
+*.log
+
+# Don't ignore this file itself
+!.gitignore
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6ea387f
--- /dev/null
+++ b/README.md
@@ -0,0 +1,9 @@
+
+This repo contains hadoop tasks (mapreduce and pig), luigi jobs, and other
+scripts and code for the journal ingest pipeline.
+
+This repository is potentially public. Maybe we'll rename it "sandcrawler"?
+
+Archive-specific deployment/production guides and ansible scripts at:
+[journal-infra](https://git.archive.org/bnewbold/journal-infra)
+