aboutsummaryrefslogtreecommitdiffstats
path: root/python_hadoop
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2022-10-24 14:17:44 -0700
committerBryan Newbold <bnewbold@archive.org>2022-10-24 14:17:46 -0700
commit4f0d10f4b38534eda673a8dfe28e3a58af9a8a8a (patch)
tree87c84d496a9976084fc4af7825e549c07fbcffb9 /python_hadoop
parent855153ae4fe03656adde16c56a4347f4b3d26487 (diff)
downloadsandcrawler-4f0d10f4b38534eda673a8dfe28e3a58af9a8a8a.tar.gz
sandcrawler-4f0d10f4b38534eda673a8dfe28e3a58af9a8a8a.zip
ingest: don't prefer WARC over SPN so strongly
We generally prefer an older WARC record over an SPN record, because the lookup is easier. But, this was causing problems with repeated ingest, so demote it. We may want to make this more configurable in the future, so things like HTML sub-resource lookups or bulk ingest won't prefer random new SPN captures.
Diffstat (limited to 'python_hadoop')
0 files changed, 0 insertions, 0 deletions