aboutsummaryrefslogtreecommitdiffstats
path: root/python_hadoop/tests/test_extraction_cdx_grobid.py
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-11-08 19:31:14 -0800
committerBryan Newbold <bnewbold@archive.org>2020-11-08 19:35:15 -0800
commitecd36863e607e3c9e71fd91ece44a422f88dbe2e (patch)
treec9f06dcb7b6a3b1b24fa03b79088110cee811a8b /python_hadoop/tests/test_extraction_cdx_grobid.py
parent0850b7fe7d5266ee0c4153b3e333d93eff164857 (diff)
downloadsandcrawler-ecd36863e607e3c9e71fd91ece44a422f88dbe2e.tar.gz
sandcrawler-ecd36863e607e3c9e71fd91ece44a422f88dbe2e.zip
ingest: default to html_biblio for PDF URL extraction
Diffstat (limited to 'python_hadoop/tests/test_extraction_cdx_grobid.py')
0 files changed, 0 insertions, 0 deletions