fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	ingest: longer ES timeout	Bryan Newbold	2022-02-25	1	-1/+1
\|
*	update sentry SDK configuration	Bryan Newbold	2022-02-25	1	-3/+1
\|
*	ingest tool: clear_scroll allowed in es-public-proxy for some time	Bryan Newbold	2022-01-21	1	-8/+0
\|
*	move from raven to sentry_sdk	Martin Czygan	2021-12-14	1	-2/+2
\| \| \| \| \| \| \| \| \|	related docs: * https://docs.sentry.io/platforms/python/guides/flask/migration/ * https://docs.sentry.io/platforms/python/guides/asgi/configuration/integrations/flask/ > `fetch_git_sha` is gone, see: https://forum.sentry.io/t/fetch-git-sha-equivalent-in-the-unified-python-sdk/5521
*	typing: first batch of python bulk type annotations	Bryan Newbold	2021-11-03	1	-6/+6
\| \| \| \| \| \|	While these changes are more delicate than simple lint changes, this specific batch of edits and annotations was relatively simple, and resulted in few code changes other than function signature additions.
*	fmt (black): *.py	Bryan Newbold	2021-11-02	1	-95/+115
\|
*	python: isort everything	Bryan Newbold	2021-11-02	1	-6/+6
\|
*	default ingest request topic now '-daily'; configurable for ingest_tool.py	Bryan Newbold	2021-09-30	1	-1/+6
\|
*	ingest: don't 'track_total_hits' for ES 7.x count()	Bryan Newbold	2021-05-31	1	-1/+1
\|
*	fatcat_ingest: fix recent lint failure	Bryan Newbold	2021-04-09	1	-1/+1
\|
*	search: more ES 7.x changes (track total counts)	Bryan Newbold	2021-04-09	1	-0/+1
\|
*	ingest tool: support for setting ingest type	Bryan Newbold	2020-11-06	1	-0/+4
\|
*	ingest: configurable ES indexv0.3.2	Bryan Newbold	2020-04-08	1	-1/+4
\|
*	Merge pull request #53 from EdwardBetts/spelling	bnewbold	2020-03-27	1	-1/+1
\|\ \| \| \| \|	Correct spelling mistakes
\| *	Correct spelling mistakes	Edward Betts	2020-03-27	1	-1/+1
\| \|
* \|	add --force-crawl flag to ingest tool	Bryan Newbold	2020-03-02	1	-0/+5
\|/
*	fatcat_ingest: as 'fatcat-ingest', not 'fatcat-ingest-container'	Bryan Newbold	2020-02-14	1	-1/+1
\| \| \| \|	This tool is more generic now.
*	switch '!= None' to 'is not None'	Bryan Newbold	2020-02-04	1	-3/+3
\| \| \| \|	As reminded in code review, thanks Martin.
*	allow-non-oa is a top-level flag, not sub-command	Bryan Newbold	2020-02-04	1	-3/+0
\|
*	ingest: add 'extid' and 'query' modes; filters; refactor	Bryan Newbold	2020-02-04	1	-38/+147
\| \| \| \| \| \|	This is a large refactor of the ingest script. It adds a number of filtering options (for all modes), and new modes for free-form queries or limiting to specific external identifiers.
*	remove 'oa_only' feature from ingest transform	Bryan Newbold	2020-01-28	1	-1/+0
\| \| \| \|	Refactoring to move this filter elsewhere
*	add missing sentry/raven tags	Bryan Newbold	2020-01-10	1	-2/+7
\| \| \| \| \| \|	Good to have exceptions tracked and stored even for commands run from the command line. But in particular the importer runs as a kafka worker and should be tracking excpetions.
*	container_issnl, not issnl, for ES release query	Bryan Newbold	2019-12-12	1	-1/+1
\| \| \| \|	Caught by Martin in review; Thanks!
*	improve argparse usage	Bryan Newbold	2019-12-11	1	-6/+4
\| \| \| \| \| \| \| \| \| \|	--fatcat-api-url is clearer than --host-url remove unimplemented --debug (copy/paste from webface argparse) use formater which will display 'default' parameters with --help Thanks to Martin for pointing out the later, which i've always wanted!
*	simplify ES scroll deletion using param()	Bryan Newbold	2019-12-11	1	-29/+29
\| \| \| \| \| \| \| \| \| \| \|	This gets rid of some mess error handling code by properly configuring the elasticsearch client to just not clean up scroll iterators when accessing the public (prod or qa) search interfaces. Leaving the scroll state around isn't ideal, so we still delete them if possible (eg, connecting directly to elasticsearch). Thanks to Martin for pointing out this solution in review.
*	add ingest-container command (new CLI tool)	Bryan Newbold	2019-12-10	1	-0/+136
	The intent of this tool is to make it easy to enque ingest requests into kafka, to be processed by a worker pool and eventually end up inserted into fatcat (for ingest hits that pass various checks). As a specific example use-case, we have pretty good coverage of eLife (a prominent OA publisher), but have missed some publications in the past, and have a large gap for the year 2019: https://fatcat.wiki/container/en4qj5ijrbf5djxx7p5zzpjyoq/coverage This tool would make it trivial to enqueue all the missing releases to be crawled. Future variants on this tool could query for, eg, long-tail OA works.