diff options
author | Bryan Newbold <bnewbold@archive.org> | 2018-03-29 11:59:28 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2018-03-29 11:59:28 -0700 |
commit | 63186a218e2e10848d4b014eacbc4ad3a51a20ca (patch) | |
tree | 85222d54fa0420bc4df2bb4c5c7b65eefffb5a26 | |
download | sandcrawler-63186a218e2e10848d4b014eacbc4ad3a51a20ca.tar.gz sandcrawler-63186a218e2e10848d4b014eacbc4ad3a51a20ca.zip |
init repo
-rw-r--r-- | .gitignore | 21 | ||||
-rw-r--r-- | README.md | 9 |
2 files changed, 30 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..81a4762 --- /dev/null +++ b/.gitignore @@ -0,0 +1,21 @@ +*.o +*.a +*.pyc +#*# +*~ +*.swp +.* +*.tmp +*.old +*.profile +*.bkp +*.bak +[Tt]humbs.db +*.DS_Store +build/ +_build/ +src/build/ +*.log + +# Don't ignore this file itself +!.gitignore diff --git a/README.md b/README.md new file mode 100644 index 0000000..6ea387f --- /dev/null +++ b/README.md @@ -0,0 +1,9 @@ + +This repo contains hadoop tasks (mapreduce and pig), luigi jobs, and other +scripts and code for the journal ingest pipeline. + +This repository is potentially public. Maybe we'll rename it "sandcrawler"? + +Archive-specific deployment/production guides and ansible scripts at: +[journal-infra](https://git.archive.org/bnewbold/journal-infra) + |