aboutsummaryrefslogtreecommitdiffstats
path: root/python/tests/files/datacite
Commit message (Collapse)AuthorAgeFilesLines
* datacite: more careful title string access; fixes sentry #88350Martin Czygan2021-06-112-0/+95
| | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'}
* datacite: a missing surname should be None, not the empty stringMartin Czygan2021-04-022-2/+0
| | | | refs sentry #77700
* datacite importer: update test cases for 'Additional file' as component, not ↵Bryan Newbold2020-08-115-5/+5
| | | | stub
* datacite import: figshare-specific hacksBryan Newbold2020-08-111-0/+1
|
* datacite: adjust testsMartin Czygan2020-07-104-10/+6
|
* wip: contrib, GH59Martin Czygan2020-07-105-3/+105
|
* datacite: address duplicated contributor issueMartin Czygan2020-07-074-10/+93
| | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406
* datacite: fix type errorMartin Czygan2020-04-222-0/+76
| | | | | | | Up to now, we expected the description to be a string or list. Add handling for int as well. First appeared: Apr 22 19:58:39.
* datacite: fix a raw name constraint violationMartin Czygan2020-04-202-0/+77
| | | | | | | It was possible that contribs got added which had no raw name. One example would be a name consisting of whitespace only. This fix adds a final check for this case.
* datacite: add exception for https://www.micropublication.org/Martin Czygan2020-01-311-1/+2
|
* datacite: improve date handling and minor tweakMartin Czygan2020-01-302-0/+110
| | | | | | | | | | | | | Records from https://www.micropublication.org/ did not have a date in FC, although raw data contained date strings - they were not using the finer-grained "attributes.date" but "attributes.published" and/or "attributes.publicationYear". Support for those fields has been added, including a test case. During this test (#30) a processing gap for names became clear (author may have "given_name" and "surname", but no "name"). This bug has been fixed, too.
* do not normalize "en dash" in DOIMartin Czygan2020-01-171-1/+1
| | | | | | | | | Technically, [...] DOI names may incorporate any printable characters from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1). For mostly QA reasons, we currently treat a DOI with an "en dash" as invalid.
* datacite: ignore known unknown values in resourceType*Martin Czygan2020-01-092-0/+94
|
* datacite: abstracts may be strings or list of stringsMartin Czygan2020-01-094-0/+186
|
* datacite: improve license_slug handlingMartin Czygan2020-01-092-1/+3
|
* datacite: add 'Unknown' to blacklistMartin Czygan2020-01-091-7/+1
|
* datacite: get rid of schemaVersionMartin Czygan2020-01-0917-32/+14
|
* datacite: reformat test cases and use jq . --sort-keysMartin Czygan2020-01-0854-2299/+2301
|
* datacite: factor out contributor handlingMartin Czygan2020-01-084-0/+105
| | | | | | | Use values from: * attributes.creators[] * attributes.contributors[]
* datacite: adjust tests for release_monthMartin Czygan2020-01-0812-12/+12
|
* datacite: mark additional files as stubMartin Czygan2020-01-082-0/+72
|
* datacite: CCDC are entries, mostlyMartin Czygan2020-01-081-1/+1
|
* datacite: adding datacite-specific extra metadataMartin Czygan2020-01-0730-1468/+1570
| | | | | | | | | | | | | * attributes.metadataVersion * attributes.schemaVersion * attributes.version (source dependent values, follows suggestions in https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf#page=26, but values vary) Furthermore: * attributes.types.resourceTypeGeneral * attributes.types.resourceType
* datacite: month field should be top-levelMartin Czygan2020-01-0611-14/+14
|
* datacite: include month in extraMartin Czygan2020-01-0611-11/+13
| | | | | > include release_month as a top-level extra field [...] to auto-populate the schema field from that
* datacite: clean abstracts, use unknown value tokensMartin Czygan2020-01-063-3/+3
| | | | | | | | Datacite defines placeholders for unknown values: * https://support.datacite.org/docs/schema-values-unknown-information-v43 Clean abstracts.
* datacite: always include "datacite" key in extraMartin Czygan2020-01-0414-26/+26
| | | | | | > always include extra values for the respective DOI registrars (datacite, crossref, jalc), even if they are empty ({}), to be used as a flag so we know which DOI registrar supplied the metadata.
* datacite: remove --lang-detect flagMartin Czygan2020-01-035-10/+15
| | | | Estimated time for a single call is in the order of 50ms.
* datacite: add another test caseMartin Czygan2020-01-022-0/+70
|
* datacite: open case for editing after creationMartin Czygan2020-01-021-0/+2
|
* datacite: add helper script to create new test caseMartin Czygan2020-01-021-0/+14
|
* datacite: address raw_name index form commentMartin Czygan2020-01-0219-111/+111
| | | | | | | | | > The convention for display_name and raw_name is to be how the name would normally be printed, not in index form (surname comma given_name). So we might need to un-encode names like "Tricart, Pierre". Use an additional `index_form_to_display_name` function to convert index from to display form, heuristically.
* datacite: add conversion fixturesMartin Czygan2020-01-0249-0/+3924
The `test_datacite_conversions` function will compare an input (datacite) document to an expected output (release entity as JSON). This way, it should not be too hard to add more cases by adding: input, output - and by increasing the counter in the range loop within the test. To view input and result side by side with vim, change into the test directory and run: tests/files/datacite $ ./caseview.sh 18