diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-09-14 14:13:34 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-09-14 14:13:34 -0700 |
commit | ee6129ea884036b666de7cff4ad7891675a52b3c (patch) | |
tree | f3f2d4970f2622b16425eab7ae0de2eacac30ef5 /notes/url_pattern_heuristic_backfill.txt | |
parent | 62252a6179953ccc79a6cb60c40a756fa0a034e1 (diff) | |
download | sandcrawler-ee6129ea884036b666de7cff4ad7891675a52b3c.tar.gz sandcrawler-ee6129ea884036b666de7cff4ad7891675a52b3c.zip |
ingest: treat text/xml as XHTML in pdf ingest
Diffstat (limited to 'notes/url_pattern_heuristic_backfill.txt')
0 files changed, 0 insertions, 0 deletions