fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	datacite: prepare release_month (stub)	Martin Czygan	2020-01-03	2	-24/+24
\|
*	datacite: lowercase only once	Martin Czygan	2020-01-03	1	-3/+4
\|
*	add pycountry dependency	Martin Czygan	2020-01-03	2	-1/+9
\|
*	add missing pathlib2 dependency	Martin Czygan	2020-01-03	2	-1/+18
\| \| \| \| \|	first seen in CI (jobs/230137), slightly related: https://github.com/pytest-dev/pytest/issues/3953
*	update potentially outdated Pipfile.lock	Martin Czygan	2020-01-03	1	-96/+86
\| \| \| \| \| \| \| \|	via: $ pipenv lock CI complained with a slightly cryptic: > TypeError: __init__() missing 1 required positional argument: 'self'
*	datacite: remove --lang-detect flag	Martin Czygan	2020-01-03	7	-25/+21
\| \| \| \|	Estimated time for a single call is in the order of 50ms.
*	datacite: add another test case	Martin Czygan	2020-01-02	3	-1/+71
\|
*	datacite: open case for editing after creation	Martin Czygan	2020-01-02	1	-0/+2
\|
*	datacite: add helper script to create new test case	Martin Czygan	2020-01-02	1	-0/+14
\|
*	datacite: address raw_name index form comment	Martin Czygan	2020-01-02	21	-112/+171
\| \| \| \| \| \| \| \| \|	> The convention for display_name and raw_name is to be how the name would normally be printed, not in index form (surname comma given_name). So we might need to un-encode names like "Tricart, Pierre". Use an additional `index_form_to_display_name` function to convert index from to display form, heuristically.
*	datacite: add two more skipable tokens	Martin Czygan	2020-01-02	1	-1/+1
\|
*	datacite: add conversion fixtures	Martin Czygan	2020-01-02	50	-1/+3949
\| \| \| \| \| \| \| \| \| \| \| \| \|	The `test_datacite_conversions` function will compare an input (datacite) document to an expected output (release entity as JSON). This way, it should not be too hard to add more cases by adding: input, output - and by increasing the counter in the range loop within the test. To view input and result side by side with vim, change into the test directory and run: tests/files/datacite $ ./caseview.sh 18
*	datacite: names can be 'Unav', too	Martin Czygan	2020-01-02	1	-1/+4
\|
*	datacite: avoid more None values	Martin Czygan	2020-01-01	1	-4/+4
\|
*	datacite: address 'Unpublished' publisher	Martin Czygan	2019-12-31	1	-9/+10
\|
*	datacite: ensure name schema is defined	Martin Czygan	2019-12-31	1	-1/+2
\|
*	datacite: fix typo	Martin Czygan	2019-12-31	1	-1/+1
\|
*	datacite: isascii was added in 3.7, only	Martin Czygan	2019-12-31	1	-1/+7
\|
*	datacite: skip non-ascii doi for now	Martin Czygan	2019-12-31	1	-0/+4
\| \| \| \| \| \|	Example of a non-ascii doi: * https://doi.org/10.13125/américacrítica/3017
*	datacite: clean doi	Martin Czygan	2019-12-31	1	-1/+13
\| \| \| \| \| \| \|	address issue with EN DASH DOI. > "external identifier doesn't match required pattern for a DOI (expected, eg, '10.1234/aksjdfh'): 10.25513/1812-3996.2017.1.34–42"
*	datacite: update docs	Martin Czygan	2019-12-31	1	-9/+9
\|
*	datacite: perform additional checks on contrib	Martin Czygan	2019-12-30	1	-3/+9
\|
*	datacite: check for empty title after clean	Martin Czygan	2019-12-29	1	-2/+5
\|
*	datacite: update docs with observed values	Martin Czygan	2019-12-29	1	-1/+3
\|
*	datacite: page number misses are too common	Martin Czygan	2019-12-28	1	-1/+2
\| \| \| \| \| \|	Should be a level debug, not info. Examples: E675, n/a, 15D.2.1, 15D.2.1, A.1E.1, A.1E.1, ...
*	datacite: suppress debug-like language lookup miss message	Martin Czygan	2019-12-28	1	-1/+3
\|
*	datacite: adjust tests	Martin Czygan	2019-12-28	1	-2/+1
\|
*	datacite: treat untyped names as people	Martin Czygan	2019-12-28	1	-1/+1
\|
*	datacite: include container_name top level key in extra	Martin Czygan	2019-12-28	1	-7/+21
\|
*	datacite: use clean on field values	Martin Czygan	2019-12-28	1	-2/+28
\|
*	datacite: include doi in error messages	Martin Czygan	2019-12-28	1	-8/+8
\|
*	remove langcodes dependency	Martin Czygan	2019-12-28	2	-15/+0
\|
*	datacite: limit abstract length	Martin Czygan	2019-12-28	1	-0/+6
\|
*	datacite: use iso 639-1 codes	Martin Czygan	2019-12-28	1	-7/+4
\|
*	datacite: use specific auth var	Martin Czygan	2019-12-28	1	-1/+1
\|
*	datacite: add missing --extid-map-file flag	Martin Czygan	2019-12-28	1	-0/+4
\|
*	address first round of MR14 comments	Martin Czygan	2019-12-28	4	-150/+503
\| \| \| \| \| \| \| \| \| \| \| \| \|	* add missing langdetect * use entity_to_dict for json debug output * factor out code for fields in function and add table driven tests * update citeproc types * add author as default role * add raw_affiliation * include relations from datacite * remove url (covered by doi already) Using yapf for python formatting.
*	datacite: move common date patterns out of the loop	Martin Czygan	2019-12-28	1	-3/+4
\| \| \| \|	Additionally, try the unspecific (%Y) pattern last.
*	improve datacite field mapping and import	Martin Czygan	2019-12-28	5	-59/+245
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current version succeeded to import a random sample of 100000 records (0.5%) from datacite. The --debug (write JSON to stdout) and --insert-log-file (log batch before committing to db) flags are temporary added to help debugging. Add few unit tests. Some edge cases: a) Existing keys without value requires a slightly awkward: ``` titles = attributes.get('titles', []) or [] ``` b) There can be 0, 1, or more (first one wins) titles. c) Date handling is probably not ideal. Datacite has a potentiall fine grained list of dates. The test case (tests/files/datacite_sample.jsonl) refers to https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main descriptor) 1986. The datacite record contains: 2017 (publicationYear, probably the year of record creation with reference system), 1978-06-03 (collected, e.g. experimental sample), 1986 ("Accepted"). The online version of the resource knows even one more date (2019-06-05 10:14:43 by WIEWS update).
*	datacite: add missing mappings and notes	Martin Czygan	2019-12-28	1	-266/+175
\|
*	datacite: basic field mappings	Martin Czygan	2019-12-28	1	-41/+181
\| \| \| \| \| \| \| \| \| \|	Currently using two external libraries: * dateparser * langcodes Note: This commit includes lots of wip docs and field stat in comment, which should be removed.
*	datacite: importer skeleton	Martin Czygan	2019-12-28	4	-0/+514
\| \| \| \| \| \|	* contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo.
*	bulk edit updates	Bryan Newbold	2019-12-26	1	-3/+4
\|
*	orcid: skip non-person ORCID records	Bryan Newbold	2019-12-26	1	-0/+4
\|
*	Merge branch 'martin-datacite-daily-harvest' into 'master'	Martin Czygan	2019-12-26	3	-5/+73
\|\ \| \| \| \| \| \| \| \|	Datacite daily harvest See merge request webgroup/fatcat!6
\| *	datacite: fix harvest test	Martin Czygan	2019-12-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Produced messages should match: jq '.data\|length' tests/files/datacite_api.json
\| *	datacite: add simple test and fixture for datacite api interaction	Martin Czygan	2019-12-27	2	-0/+46
\| \|
\| *	datacite: extend range search query	Martin Czygan	2019-12-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The bracket syntax is inclusive. See also: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#_ranges
\| *	avoid usage of short links	Martin Czygan	2019-12-27	1	-2/+2
\| \|
\| *	Datacite API v2 throws 400, we cannot recover from, currently.	Martin Czygan	2019-12-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a first iteration, just mark the daily batch complete and continue. The occasional HTTP 400 issue has been reported as https://github.com/datacite/datacite/issues/897. A possible improvement would be to shrink the window, so losses will be smaller.