diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-03-18 18:49:05 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-03-18 18:49:09 -0700 |
commit | cb16d18137c936a634b75bf0eb6acb43c77d9290 (patch) | |
tree | 4b8b72aa7cd1d5a9da81c6233ea10b6cdc837d2a /python/tests/test_grobid.py | |
parent | e1b3edd7af59fe0fd4272a4696387ea09a22a6c0 (diff) | |
download | sandcrawler-cb16d18137c936a634b75bf0eb6acb43c77d9290.tar.gz sandcrawler-cb16d18137c936a634b75bf0eb6acb43c77d9290.zip |
implement (unused) force_get flag for SPN2
I hoped this feature would make it possible to crawl journals.lww.com
PDFs, because the token URLs work with `wget`, but it still doesn't seem
to work. Maybe because of user agent?
Anyways, this feature might be useful for crawling efficiency, so adding
to master.
Diffstat (limited to 'python/tests/test_grobid.py')
0 files changed, 0 insertions, 0 deletions