diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-11-06 18:17:49 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-11-06 18:17:52 -0800 |
commit | 8958b12ff12c59f1c1f7267a509a99bfaa14c7d7 (patch) | |
tree | 8fdc9fb84edf1ec3545384f8752156ebf2c8eecf /pig/tests/test_join_cdx.py | |
parent | 8f4a22d78acb6518c6546645557ad5f0d2253c66 (diff) | |
download | sandcrawler-8958b12ff12c59f1c1f7267a509a99bfaa14c7d7.tar.gz sandcrawler-8958b12ff12c59f1c1f7267a509a99bfaa14c7d7.zip |
html: pdf and html extract similar to XML
Note that the primary PDF URL extraction path is a separate code path.
Diffstat (limited to 'pig/tests/test_join_cdx.py')
0 files changed, 0 insertions, 0 deletions