sandcrawler - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	simple persist worker/tool to backfill grobid_refs	Bryan Newbold	2021-11-10	1	-0/+22
\|
*	crossref persist: batch size depends on whether parsing refs	Bryan Newbold	2021-11-04	1	-1/+4
\|
*	crossref persist: make GROBID ref parsing an option (not default)	Bryan Newbold	2021-11-04	1	-0/+6
\|
*	glue, utils, and worker code for crossref and grobid_refs	Bryan Newbold	2021-11-04	1	-0/+30
\|
*	make fmt (black 21.9b0)	Bryan Newbold	2021-10-27	1	-69/+109
\|
*	make fmt	Bryan Newbold	2021-10-26	1	-63/+62
\|
*	python: isort all imports	Bryan Newbold	2021-10-26	1	-1/+1
\|
*	refactor 'minio' to 'seaweedfs'; and BLOB env vars	Bryan Newbold	2020-11-03	1	-9/+9
\| \| \| \| \|	This goes along with changes to ansible deployment to use the correct key names and values.
*	lint fixes	Bryan Newbold	2020-06-17	1	-2/+1
\|
*	add new pdf workers/persisters	Bryan Newbold	2020-06-17	1	-0/+30
\|
*	persist grobid: add option to skip S3 upload	Bryan Newbold	2020-03-19	1	-0/+4
\| \| \| \| \| \| \|	Motivation for this is that current S3 target (minio) is overloaded, with too many files on a single partition (80 million+). Going to look in to seaweedfs and other options, but for now stopping minio persist. Data is all stored in kafka anyways.
*	fixes to ingest-request persist	Bryan Newbold	2020-03-05	1	-1/+1
\|
*	persist: ingest_request tool (with no ingest_file_result)	Bryan Newbold	2020-03-05	1	-0/+18
\|
*	pdftrio basic python code	Bryan Newbold	2020-02-12	1	-0/+18
\| \| \| \|	This is basically just a copy/paste of GROBID code, only simpler!
*	improve sentry reporting with 'release' git hash	Bryan Newbold	2020-01-15	1	-1/+0
\|
*	more ftp status 226 support	Bryan Newbold	2020-01-14	1	-1/+1
\|
*	add PersistGrobidDiskWorker	Bryan Newbold	2020-01-02	1	-0/+27
\| \| \| \|	To help with making dumps directly from Kafka (eg, for partner delivery)
*	flush out minio helper, add to grobid persist	Bryan Newbold	2020-01-02	1	-2/+20
\|
*	start work on persist workers and tool	Bryan Newbold	2020-01-02	1	-0/+98