persist grobid: add option to skip S3 upload - sandcrawler

diff options

author	Bryan Newbold <bnewbold@archive.org>	2020-03-19 16:10:40 -0700
committer	Bryan Newbold <bnewbold@archive.org>	2020-03-19 16:10:42 -0700
commit	88f337f2cc40824ed3eaf32b1fec17c3b053bfdf (patch)
tree	ae7ae1a02906adf663098dc4e7762279d5ac2ac8 /notes/url_pattern_heuristic_backfill.txt
parent	e21fac21cc5a4267357a499f75f048ee5fd38ddb (diff)
download	sandcrawler-88f337f2cc40824ed3eaf32b1fec17c3b053bfdf.tar.gz sandcrawler-88f337f2cc40824ed3eaf32b1fec17c3b053bfdf.zip

persist grobid: add option to skip S3 upload

Motivation for this is that current S3 target (minio) is overloaded, with too many files on a single partition (80 million+). Going to look in to seaweedfs and other options, but for now stopping minio persist. Data is all stored in kafka anyways.

Diffstat (limited to 'notes/url_pattern_heuristic_backfill.txt')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: