diff options
| author | Bryan Newbold <bnewbold@archive.org> | 2020-10-19 15:46:37 -0700 | 
|---|---|---|
| committer | Bryan Newbold <bnewbold@archive.org> | 2020-10-19 15:46:39 -0700 | 
| commit | b672a6fe5b0e51f9d2844443bf9f7e82e1fd41b1 (patch) | |
| tree | 82e03127ff94c9fb1c0d1807f9f76f367a0f37de /notes/url_pattern_heuristic_backfill.txt | |
| parent | cc26ea975e29eefa2e2d3565c55ba0ac0a491bb7 (diff) | |
| download | sandcrawler-b672a6fe5b0e51f9d2844443bf9f7e82e1fd41b1.tar.gz sandcrawler-b672a6fe5b0e51f9d2844443bf9f7e82e1fd41b1.zip | |
CDX fetch: more permissive fuzzy/normalization check
This might the source of some `spn2-cdx-lookup-failure`.
Wayback/CDX does this check via full-on SURT, with many more changes,
and potentially we should be doing that here as well.
Diffstat (limited to 'notes/url_pattern_heuristic_backfill.txt')
0 files changed, 0 insertions, 0 deletions
