diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-10-21 12:22:30 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-10-21 12:22:30 -0700 |
commit | 86cc15d9c2e1f2e857d0dcf141dd5ea4d720dff5 (patch) | |
tree | f2eccc61f14b9159f7656e873b288ef2bbf38db7 /python_hadoop/kafka_grobid_hbase.py | |
parent | 200bf734bd459dd3c7a147b3dfe127dbf0ed7f70 (diff) | |
download | sandcrawler-86cc15d9c2e1f2e857d0dcf141dd5ea4d720dff5.tar.gz sandcrawler-86cc15d9c2e1f2e857d0dcf141dd5ea4d720dff5.zip |
ingest: add a check for blocked-cookie before trying PDF url extraction
Diffstat (limited to 'python_hadoop/kafka_grobid_hbase.py')
0 files changed, 0 insertions, 0 deletions