diff options
| author | Bryan Newbold <bnewbold@archive.org> | 2020-03-23 10:32:47 -0700 |
|---|---|---|
| committer | Bryan Newbold <bnewbold@archive.org> | 2020-03-23 10:32:50 -0700 |
| commit | 84eeefbd3c55ea31bcf552f9c129c0e1576717ae (patch) | |
| tree | a66a381fc9535ff3ed6b3760c573b4476e3ab043 /proposals/2019_pdftotext_pdfinfo.md | |
| parent | e5ad7bddbcb55471b96ce30397ed85fe98e3b098 (diff) | |
| download | sandcrawler-84eeefbd3c55ea31bcf552f9c129c0e1576717ae.tar.gz sandcrawler-84eeefbd3c55ea31bcf552f9c129c0e1576717ae.zip | |
ingest: clean_url() in more places
Some 'cdx-error' results were due to URLs with ':' after the hostname or
trailing newline ("\n") characters in the URL. This attempts to work
around this categroy of error.
Diffstat (limited to 'proposals/2019_pdftotext_pdfinfo.md')
0 files changed, 0 insertions, 0 deletions
