aboutsummaryrefslogtreecommitdiffstats
path: root/pig/tests/test_join_cdx.py
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-11-06 18:17:49 -0800
committerBryan Newbold <bnewbold@archive.org>2020-11-06 18:17:52 -0800
commit8958b12ff12c59f1c1f7267a509a99bfaa14c7d7 (patch)
tree8fdc9fb84edf1ec3545384f8752156ebf2c8eecf /pig/tests/test_join_cdx.py
parent8f4a22d78acb6518c6546645557ad5f0d2253c66 (diff)
downloadsandcrawler-8958b12ff12c59f1c1f7267a509a99bfaa14c7d7.tar.gz
sandcrawler-8958b12ff12c59f1c1f7267a509a99bfaa14c7d7.zip
html: pdf and html extract similar to XML
Note that the primary PDF URL extraction path is a separate code path.
Diffstat (limited to 'pig/tests/test_join_cdx.py')
0 files changed, 0 insertions, 0 deletions