diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-10-20 17:37:52 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-10-20 17:37:54 -0700 |
commit | 33249f2679851afb64142c428be45d16f35f5539 (patch) | |
tree | 0c9445a5caffaaaf4d659ab12dcaf6b370402a50 /python_hadoop/common.py | |
parent | 36577de5bd84fbc9311d8938b8d5642cf856b1f8 (diff) | |
download | sandcrawler-33249f2679851afb64142c428be45d16f35f5539.tar.gz sandcrawler-33249f2679851afb64142c428be45d16f35f5539.zip |
persist PDF extraction in ingest pipeline
Ooof, didn't realize that this wasn't happening. Explains a lot of
missing thumbnails in scholar!
Diffstat (limited to 'python_hadoop/common.py')
0 files changed, 0 insertions, 0 deletions