aboutsummaryrefslogtreecommitdiffstats
path: root/python/sandcrawler/html_metadata.py
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2022-07-14 15:03:49 -0700
committerBryan Newbold <bnewbold@archive.org>2022-07-14 15:03:51 -0700
commitb5217753166956eed14cf2c91ec52d883d6a5a56 (patch)
tree758026fb0061d66e49fede1b3ef451d56ab8ac93 /python/sandcrawler/html_metadata.py
parentb680c255508e6721185c6793bc872c0dc97864a0 (diff)
downloadsandcrawler-b5217753166956eed14cf2c91ec52d883d6a5a56.tar.gz
sandcrawler-b5217753166956eed14cf2c91ec52d883d6a5a56.zip
cdx lookups: prioritize truely exact URL matches
This hopefully resolves an issue causing many apparent redirect loops, which were actually timing or HTTP status code near-loops with http/https fuzzy matching in CDX API. Despite "exact" API lookup semantics.
Diffstat (limited to 'python/sandcrawler/html_metadata.py')
0 files changed, 0 insertions, 0 deletions