diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-01-14 15:30:42 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-01-14 15:38:20 -0800 |
commit | 648f04bfdcf441ce4a396d09bdd0443b2a2ca51e (patch) | |
tree | 58553c0854e81e46df934b011be7e2d817c14319 /fetch_hadoop.sh | |
parent | 49c4f4a4050a76e772f6ef9bf9ca544e2d54e2ab (diff) | |
download | sandcrawler-648f04bfdcf441ce4a396d09bdd0443b2a2ca51e.tar.gz sandcrawler-648f04bfdcf441ce4a396d09bdd0443b2a2ca51e.zip |
basic FTP ingest support; revist record resolution
- supporting revisits means more wayback hits (fewer crawls) => faster
- ... but this is only partial support. will also need to work through
sandcrawler db schema, etc. current status should be safe to merge/use.
- ftp support via treating an ftp hit as a 200
Diffstat (limited to 'fetch_hadoop.sh')
0 files changed, 0 insertions, 0 deletions