aboutsummaryrefslogtreecommitdiffstats
path: root/python_hadoop/kafka_grobid_hbase.py
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-10-20 17:37:52 -0700
committerBryan Newbold <bnewbold@archive.org>2020-10-20 17:37:54 -0700
commit33249f2679851afb64142c428be45d16f35f5539 (patch)
tree0c9445a5caffaaaf4d659ab12dcaf6b370402a50 /python_hadoop/kafka_grobid_hbase.py
parent36577de5bd84fbc9311d8938b8d5642cf856b1f8 (diff)
downloadsandcrawler-33249f2679851afb64142c428be45d16f35f5539.tar.gz
sandcrawler-33249f2679851afb64142c428be45d16f35f5539.zip
persist PDF extraction in ingest pipeline
Ooof, didn't realize that this wasn't happening. Explains a lot of missing thumbnails in scholar!
Diffstat (limited to 'python_hadoop/kafka_grobid_hbase.py')
0 files changed, 0 insertions, 0 deletions