fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	switch '!= None' to 'is not None'	Bryan Newbold	2020-02-04	1	-3/+3
\| \| \| \|	As reminded in code review, thanks Martin.
*	allow-non-oa is a top-level flag, not sub-command	Bryan Newbold	2020-02-04	1	-3/+0
\|
*	ingest: add 'extid' and 'query' modes; filters; refactor	Bryan Newbold	2020-02-04	1	-38/+147
\| \| \| \| \| \|	This is a large refactor of the ingest script. It adds a number of filtering options (for all modes), and new modes for free-form queries or limiting to specific external identifiers.
*	remove 'oa_only' feature from ingest transform	Bryan Newbold	2020-01-28	1	-1/+0
\| \| \| \|	Refactoring to move this filter elsewhere
*	add missing sentry/raven tags	Bryan Newbold	2020-01-10	1	-2/+7
\| \| \| \| \| \|	Good to have exceptions tracked and stored even for commands run from the command line. But in particular the importer runs as a kafka worker and should be tracking excpetions.
*	container_issnl, not issnl, for ES release query	Bryan Newbold	2019-12-12	1	-1/+1
\| \| \| \|	Caught by Martin in review; Thanks!
*	improve argparse usage	Bryan Newbold	2019-12-11	1	-6/+4
\| \| \| \| \| \| \| \| \| \|	--fatcat-api-url is clearer than --host-url remove unimplemented --debug (copy/paste from webface argparse) use formater which will display 'default' parameters with --help Thanks to Martin for pointing out the later, which i've always wanted!
*	simplify ES scroll deletion using param()	Bryan Newbold	2019-12-11	1	-29/+29
\| \| \| \| \| \| \| \| \| \| \|	This gets rid of some mess error handling code by properly configuring the elasticsearch client to just not clean up scroll iterators when accessing the public (prod or qa) search interfaces. Leaving the scroll state around isn't ideal, so we still delete them if possible (eg, connecting directly to elasticsearch). Thanks to Martin for pointing out this solution in review.
*	add ingest-container command (new CLI tool)	Bryan Newbold	2019-12-10	1	-0/+136
	The intent of this tool is to make it easy to enque ingest requests into kafka, to be processed by a worker pool and eventually end up inserted into fatcat (for ingest hits that pass various checks). As a specific example use-case, we have pretty good coverage of eLife (a prominent OA publisher), but have missed some publications in the past, and have a large gap for the year 2019: https://fatcat.wiki/container/en4qj5ijrbf5djxx7p5zzpjyoq/coverage This tool would make it trivial to enqueue all the missing releases to be crawled. Future variants on this tool could query for, eg, long-tail OA works.