aboutsummaryrefslogtreecommitdiffstats
path: root/python_hadoop
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-02-22 16:23:25 -0800
committerBryan Newbold <bnewbold@archive.org>2020-02-22 16:23:54 -0800
commitfbfcb3cc2215613d972e589eaad519ea726b5d31 (patch)
tree6a8aca51339641159e7ed5e32ceac83e86ac59c7 /python_hadoop
parenta2a652cefdfa54c7d6bf16dfcf8b1e2e45fb8947 (diff)
downloadsandcrawler-fbfcb3cc2215613d972e589eaad519ea726b5d31.tar.gz
sandcrawler-fbfcb3cc2215613d972e589eaad519ea726b5d31.zip
ia: improve warc/revisit implementation
A lot of the terminal-bad-status seems to have due to not handling revisits correctly. They have status_code = '-' or None.
Diffstat (limited to 'python_hadoop')
0 files changed, 0 insertions, 0 deletions