fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	tweaks to file ingest importer	Bryan Newbold	2019-12-03	1	-3/+4
\| \| \| \| \|	- allow overriding source filter whitelist (common case for CLI use) - fix editgroup description env variable pass-through
*	crossref is_update isn't what I thought	Bryan Newbold	2019-12-03	1	-6/+2
\| \| \| \| \| \| \| \|	I thought this would filter for metadata updates to an existing DOI, but actually "updates" are a type of DOI (eg, a retraction). TODO: handle 'updates' field. Should both do a lookup and set work_ident appropriately, and store in crossref-specific metadata.
*	re-order ingest want() for better stats	Bryan Newbold	2019-11-15	1	-7/+10
\|
*	project -> ingest_request_source	Bryan Newbold	2019-11-15	3	-9/+9
\|
*	fix release.pmcid typo	Bryan Newbold	2019-11-15	1	-2/+2
\|
*	ingest importer fixes	Bryan Newbold	2019-11-15	1	-3/+4
\|
*	more ingest importer comments and counts	Bryan Newbold	2019-11-15	2	-2/+29
\|
*	crude support for 'sandcrawler' kafka topics	Bryan Newbold	2019-11-15	1	-2/+3
\|
*	ingest file result importer	Bryan Newbold	2019-11-15	2	-2/+135
\|
*	add ingest request feature to entity_updates worker	Bryan Newbold	2019-11-15	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	Initially was going to create a new worker to consume from the release update channel, but couldn't get the edit context ("is this a new release, or update to an existing") from that context. Currently there is a flag in source code to control whether we only do OA releases or all releases. Starting with OA only to start slow, but should probably default to all, and make this a config flag. Should probably also have a config flag to control this entire feature. Tested locally in dev.
*	add ingest request transform (and test)	Bryan Newbold	2019-11-15	2	-0/+67
\|
*	crossref: accurate blank title counts	Bryan Newbold	2019-11-05	1	-0/+1
\|
*	crossref: component type	Bryan Newbold	2019-11-04	1	-1/+3
\|
*	crossref: count why skip happened	Bryan Newbold	2019-11-04	1	-1/+7
\| \| \| \| \| \|	Might skip based on release type (eg container, not a paper/release), or missing title, or other reasons. Over 7 million DOIs are getting skipped, curious why.
*	crossref: don't skip on short/null subtitle	Bryan Newbold	2019-11-04	1	-1/+1
\| \| \| \|	This was a bug. Should only set subtitle black, not skip the import.
*	file cleanup tweaks to actually run	Bryan Newbold	2019-10-08	2	-5/+4
\|
*	refactor duplicated b32_hex function in importers	Bryan Newbold	2019-10-08	3	-21/+11
\|
*	dict wrapper for entity_from_json()	Bryan Newbold	2019-10-08	2	-3/+7
\|
*	new cleanup python tool/framework	Bryan Newbold	2019-10-08	4	-0/+241
\|
*	review/fix all confluent-kafka produce code	Bryan Newbold	2019-09-20	6	-27/+75
\|
*	small fixes to confluent-kafka importers/workers	Bryan Newbold	2019-09-20	6	-24/+67
\| \| \| \| \| \| \| \|	- decrease default changelog pipeline to 5.0sec - fix missing KafkaException harvester imports - more confluent-kafka tweaks - updates to kafka consumer configs - bump elastic updates consumergroup (again)
*	convert pipeline workers from pykafka to confluent-kafka	Bryan Newbold	2019-09-20	3	-125/+230
\|
*	small kafka tweaks for robustness	Bryan Newbold	2019-09-20	2	-0/+5
\|
*	convert importers to confluent-kafka library	Bryan Newbold	2019-09-20	1	-19/+71
\|
*	bump max message size to ~20 MBytes	Bryan Newbold	2019-09-20	2	-0/+2
\|
*	fixes to confluent-kafka harvesters	Bryan Newbold	2019-09-20	3	-20/+21
\|
*	first draft harvesters using confluent-kafka	Bryan Newbold	2019-09-20	3	-48/+104
\|
*	handle more external identifiers in python	Bryan Newbold	2019-09-18	1	-14/+97
\| \| \| \| \|	This makes it possible to, eg, past an arxiv identifier or SHA-1 hash in the general search box and do a quick lookup.
*	refactor all python source for client lib name	Bryan Newbold	2019-09-05	21	-121/+121
\|
*	fix Importer editgroup_extra pass-through	Bryan Newbold	2019-09-05	1	-2/+1
\|
*	comment clarifying container.ident in ES release transform	Bryan Newbold	2019-09-03	1	-0/+2
\|
*	file rel: social -> academicsocial	Bryan Newbold	2019-09-03	1	-2/+2
\|
*	fix previous fix (need tests)	Bryan Newbold	2019-09-03	1	-2/+2
\|
*	fix typo bug in container ES transform	Bryan Newbold	2019-09-03	1	-2/+2
\|
*	last chocula import behavior tweaks	Bryan Newbold	2019-09-03	1	-3/+21
\|
*	more careful chocula import counts; don't re-update empty URLs	Bryan Newbold	2019-09-03	1	-2/+6
\|
*	better importer 'total' counting	Bryan Newbold	2019-09-03	1	-4/+2
\|
*	chocula importer: include DOAJ updates	Bryan Newbold	2019-09-03	1	-2/+2
\|
*	use EZB and szczepanski as OA signals (ES)	Bryan Newbold	2019-09-03	1	-0/+12
\|
*	improvements to chocula importer	Bryan Newbold	2019-09-03	1	-1/+7
\|
*	implement ChoculaImporter	Bryan Newbold	2019-09-03	2	-0/+137
\|
*	improvements to wayback_static importer	Bryan Newbold	2019-08-22	1	-6/+29
\|
*	start new ES container worker kafka group	Bryan Newbold	2019-07-31	1	-0/+2
\| \| \| \| \| \| \| \|	The previous group seems to have gotten corrupted; my hypothesis is that this is due to pykafka being somewhat flakey, and am planning to move to librdkafka anyways. Re-indexing all the containers is pretty small/easy, so starting a new consumer group works find in this case; release indexer would be a bigger problem.
*	crossref: allow 'name' fallback (for groups, etc)	Bryan Newbold	2019-06-24	1	-1/+1
\|
*	add inflight edit protection to matched importer	Bryan Newbold	2019-06-24	1	-1/+8
\|
*	fix typo; do arxiv-specific match import hack	Bryan Newbold	2019-06-24	1	-3/+14
\|
*	fix syntax in existing.url cleanup	Bryan Newbold	2019-06-24	1	-1/+1
\|
*	fix existing updater	Bryan Newbold	2019-06-24	1	-2/+3
\|
*	add minimal file URL cleanups to matched importer	Bryan Newbold	2019-06-24	1	-0/+8
\|
*	matched importer: urls, not url	Bryan Newbold	2019-06-24	1	-1/+1
\| \| \| \| \| \|	This matches the docs in the header. Previous matched imports were using 'cdx' objects with no 'dt' key, but this makes more sense. As far as I know the old 'url' code path was never actually used (or tested, derp).