fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	crossref: longer comment about crossref API date fields	Bryan Newbold	2020-03-30	1	-2/+22
\|
*	Merge pull request #53 from EdwardBetts/spelling	bnewbold	2020-03-27	1	-2/+2
\|\ \| \| \| \|	Correct spelling mistakes
\| *	Correct spelling mistakes	Edward Betts	2020-03-27	1	-2/+2
\| \|
* \|	pubmed: log to stderr	Martin Czygan	2020-03-10	1	-1/+1
\| \|
* \|	pubmed: move mapping generation out of fetch_date	Martin Czygan	2020-03-10	1	-7/+8
\| \| \| \| \| \| \| \| \| \|	* fetch_date will fail on missing mapping * adjust tests (test will require access to pubmed ftp)
* \|	harvest: fix imports from HarvestPubmedWorker cleanup	Martin Czygan	2020-03-10	1	-2/+2
\| \|
* \|	pubmed: citations is a bit more precise	Martin Czygan	2020-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	> Each day, NLM produces update files that include new, revised and deleted citations. -- ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/README.txt
* \|	pubmed: we sync from FTP	Martin Czygan	2020-03-09	1	-1/+1
\| \|
* \|	oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorker	Martin Czygan	2020-03-09	1	-34/+0
\| \|
* \|	more pubmed adjustments	Martin Czygan	2020-02-22	2	-70/+118
\| \| \| \| \| \| \| \| \| \|	* regenerate map in continuous mode * add tests
* \|	pubmed ftp: fix url	Martin Czygan	2020-02-19	1	-4/+6
\| \|
* \|	pubmed ftp harvest and KafkaBs4XmlPusher	Martin Czygan	2020-02-19	2	-0/+214
\|/ \| \| \| \| \| \|	* add PubmedFTPWorker * utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream) but may live elsewhere, as they are more generic * add KafkaBs4XmlPusher
*	harvest: log state on startup and use stderr for diagnostics	Martin Czygan	2020-02-14	3	-17/+22
\|
*	datacite: extend range search query	Martin Czygan	2019-12-27	1	-1/+1
\| \| \| \| \|	The bracket syntax is inclusive. See also: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#_ranges
*	avoid usage of short links	Martin Czygan	2019-12-27	1	-2/+2
\|
*	Datacite API v2 throws 400, we cannot recover from, currently.	Martin Czygan	2019-12-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	As a first iteration, just mark the daily batch complete and continue. The occasional HTTP 400 issue has been reported as https://github.com/datacite/datacite/issues/897. A possible improvement would be to shrink the window, so losses will be smaller.
*	datacite: update documentation, add links to issues	Martin Czygan	2019-12-27	1	-10/+5
\|
*	datacite: use v2 of the API (flaky)	Martin Czygan	2019-12-27	1	-5/+28
\| \| \| \| \| \| \| \| \|	Update parameter update for datacite API v2. Works fine, but there are occasional HTTP 400 responses when using the cursor API (daily updates can exceed the 10000 record limit for search queries). The HTTP 400 issue is not solved yet, but reported to datacite as https://github.com/datacite/datacite/issues/897.
*	refactor kafka producer in crossref harvester	Bryan Newbold	2019-12-06	1	-21/+26
\| \| \| \| \| \| \| \|	producer creation/configuration should be happening in __init__() time, not 'daily' call. This specific refactor motivated by mocking out the producer in unit tests.
*	crossref is_update isn't what I thought	Bryan Newbold	2019-12-03	1	-6/+2
\| \| \| \| \| \| \| \|	I thought this would filter for metadata updates to an existing DOI, but actually "updates" are a type of DOI (eg, a retraction). TODO: handle 'updates' field. Should both do a lookup and set work_ident appropriately, and store in crossref-specific metadata.
*	review/fix all confluent-kafka produce code	Bryan Newbold	2019-09-20	3	-14/+49
\|
*	small fixes to confluent-kafka importers/workers	Bryan Newbold	2019-09-20	2	-2/+2
\| \| \| \| \| \| \| \|	- decrease default changelog pipeline to 5.0sec - fix missing KafkaException harvester imports - more confluent-kafka tweaks - updates to kafka consumer configs - bump elastic updates consumergroup (again)
*	small kafka tweaks for robustness	Bryan Newbold	2019-09-20	1	-0/+2
\|
*	bump max message size to ~20 MBytes	Bryan Newbold	2019-09-20	2	-0/+2
\|
*	fixes to confluent-kafka harvesters	Bryan Newbold	2019-09-20	3	-20/+21
\|
*	first draft harvesters using confluent-kafka	Bryan Newbold	2019-09-20	3	-48/+104
\|
*	increase default harvest window to 14 days	Bryan Newbold	2019-04-01	1	-2/+2
\|
*	HACK: force pylint to ignore urllib3 Retry import	Bryan Newbold	2019-03-15	1	-1/+3
\| \| \| \| \| \|	As the code comment mentions, not sure why pylint throws this error. requests and urllib3 are recent, and this code runs fine in tests and QA, and pylint is running (in CI) within pipenv.
*	MEDLINE/Pubmed note	Bryan Newbold	2019-03-15	1	-2/+6
\| \| \| \|	Also, arXivRaw, not arXiv (though see WIP on more-importers branch)
*	fix harvester session.get() params	Bryan Newbold	2019-03-06	1	-5/+8
\|
*	retry/backoff for Crossref harvester	Bryan Newbold	2019-03-06	2	-2/+24
\|
*	bunch of lint/whitespace cleanups	Bryan Newbold	2019-02-22	3	-9/+6
\|
*	check request status codes idiomatically	Bryan Newbold	2018-12-29	1	-2/+2
\|
*	clean up harvester comments/docs	Bryan Newbold	2018-11-21	3	-50/+31
\|
*	use isoformat() to format dates	Bryan Newbold	2018-11-21	2	-4/+4
\| \| \| \|	This shouldn't change behavior; it's just more consistent.
*	fix loop_sleep typo	Bryan Newbold	2018-11-21	2	-2/+2
\|
*	fix datacite DOI extraction	Bryan Newbold	2018-11-21	1	-1/+1
\|
*	fix OAI-PMH name/finished message	Bryan Newbold	2018-11-21	1	-1/+6
\|
*	fix oai-pmh issue again	Bryan Newbold	2018-11-21	1	-13/+14
\|
*	oaipmh: handle NoRecordsMatch	Bryan Newbold	2018-11-21	1	-5/+8
\|
*	initial OAI-PMH harvesters	Bryan Newbold	2018-11-19	3	-5/+167
\|
*	better DOI registrar harvesters	Bryan Newbold	2018-11-19	3	-48/+145
\|
*	bunch of pylint cleanup	Bryan Newbold	2018-11-15	1	-7/+12
\|
*	refactoring harvesters	Bryan Newbold	2018-11-15	5	-196/+210
\|
*	initial work on metadata harvest bots	Bryan Newbold	2018-11-14	4	-0/+197