fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	datacite: resolve formatting issues in tests	Martin Czygan	2020-07-10	1	-2/+5
\|\
\| *	lint (flake8) python test files	Bryan Newbold	2020-07-01	1	-20/+22
\| \|
* \|	wip: contrib, GH59	Martin Czygan	2020-07-10	1	-229/+361
\| \|
* \|	datacite: address duplicated contributor issue	Martin Czygan	2020-07-07	1	-1/+1
\|/ \| \| \| \| \| \|	Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406
*	datacite: improve license mapping	Martin Czygan	2020-06-30	1	-0/+14
\| \| \| \|	via "missed potential license", refs #58
*	datacite: hard cast possible date value to string	Martin Czygan	2020-06-29	1	-0/+1
\|
*	datacite: fix type error	Martin Czygan	2020-04-22	1	-1/+1
\| \| \| \| \| \| \|	Up to now, we expected the description to be a string or list. Add handling for int as well. First appeared: Apr 22 19:58:39.
*	datacite: fix a raw name constraint violation	Martin Czygan	2020-04-20	1	-1/+1
\| \| \| \| \| \| \|	It was possible that contribs got added which had no raw name. One example would be a name consisting of whitespace only. This fix adds a final check for this case.
*	datacite: improve date handling and minor tweak	Martin Czygan	2020-01-30	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Records from https://www.micropublication.org/ did not have a date in FC, although raw data contained date strings - they were not using the finer-grained "attributes.date" but "attributes.published" and/or "attributes.publicationYear". Support for those fields has been added, including a test case. During this test (#30) a processing gap for names became clear (author may have "given_name" and "surname", but no "name"). This bug has been fixed, too.
*	datacite: add entry to license slug map	Martin Czygan	2020-01-09	1	-0/+1
\|
*	datacite: ignore known unknown values in resourceType*	Martin Czygan	2020-01-09	1	-1/+1
\|
*	datacite: abstracts may be strings or list of strings	Martin Czygan	2020-01-09	1	-1/+1
\|
*	datacite: improve license_slug handling	Martin Czygan	2020-01-09	1	-1/+30
\|
*	datacite: factor out contributor handling	Martin Czygan	2020-01-08	1	-2/+2
\| \| \| \| \| \| \|	Use values from: * attributes.creators[] * attributes.contributors[]
*	datacite: mark additional files as stub	Martin Czygan	2020-01-08	1	-1/+1
\|
*	datacite: indicate mismatched file in test	Martin Czygan	2020-01-06	1	-1/+1
\|
*	datacite: use normal.clean_doi	Martin Czygan	2020-01-03	1	-4/+0
\|
*	datacite: parse_datacite_dates returns month	Martin Czygan	2020-01-03	1	-7/+16
\| \| \| \|	As [...] we will soon add support for release_month field in the release schema.
*	datacite: prepare release_month (stub)	Martin Czygan	2020-01-03	1	-14/+14
\|
*	datacite: add another test case	Martin Czygan	2020-01-02	1	-1/+1
\|
*	datacite: address raw_name index form comment	Martin Czygan	2020-01-02	1	-1/+17
\| \| \| \| \| \| \| \| \|	> The convention for display_name and raw_name is to be how the name would normally be printed, not in index form (surname comma given_name). So we might need to un-encode names like "Tricart, Pierre". Use an additional `index_form_to_display_name` function to convert index from to display form, heuristically.
*	datacite: add conversion fixtures	Martin Czygan	2020-01-02	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	The `test_datacite_conversions` function will compare an input (datacite) document to an expected output (release entity as JSON). This way, it should not be too hard to add more cases by adding: input, output - and by increasing the counter in the range loop within the test. To view input and result side by side with vim, change into the test directory and run: tests/files/datacite $ ./caseview.sh 18
*	datacite: adjust tests	Martin Czygan	2019-12-28	1	-2/+1
\|
*	address first round of MR14 comments	Martin Czygan	2019-12-28	1	-2/+176
\| \| \| \| \| \| \| \| \| \| \| \| \|	* add missing langdetect * use entity_to_dict for json debug output * factor out code for fields in function and add table driven tests * update citeproc types * add author as default role * add raw_affiliation * include relations from datacite * remove url (covered by doi already) Using yapf for python formatting.
*	improve datacite field mapping and import	Martin Czygan	2019-12-28	1	-17/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current version succeeded to import a random sample of 100000 records (0.5%) from datacite. The --debug (write JSON to stdout) and --insert-log-file (log batch before committing to db) flags are temporary added to help debugging. Add few unit tests. Some edge cases: a) Existing keys without value requires a slightly awkward: ``` titles = attributes.get('titles', []) or [] ``` b) There can be 0, 1, or more (first one wins) titles. c) Date handling is probably not ideal. Datacite has a potentiall fine grained list of dates. The test case (tests/files/datacite_sample.jsonl) refers to https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main descriptor) 1986. The datacite record contains: 2017 (publicationYear, probably the year of record creation with reference system), 1978-06-03 (collected, e.g. experimental sample), 1986 ("Accepted"). The online version of the resource knows even one more date (2019-06-05 10:14:43 by WIEWS update).
*	datacite: importer skeleton	Martin Czygan	2019-12-28	1	-0/+25
	* contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo.