| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This was used during initial bulk imports, but is no longer used and
could create serious metadata problems if used accidentially.
In retrospect, it also made metadata provenance less transparent, and
may have done more harm than good overall.
|
| |
|
| |
|
|
|
|
| |
'==' vs 'is'; 'not a in b' vs 'a not in b'; etc
|
|
|
|
|
| |
Do not add abstracts where `clean` results in the empty string - this
violates a constraint: `either abstract_sha1 or content is required`
|
|
|
|
|
| |
Caused by a partial "title entry without title" coming *first* (e.g. just
holding, e.g. a language, like: {'lang': 'da'}
|
|
|
|
|
|
|
|
|
| |
seemingly from zenodo:
* https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi
* https://doi.org/10.5281/zenodo.4041777
About 3400 records with "FULL MOVIE" in title, currently.
|
|
|
|
|
| |
Includes a tiny tweak to the datacite import sample file to test this
code path.
|
|\ |
|
| | |
|
| | |
|
|/
|
|
|
|
|
| |
Use string comparison.
* https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs
* https://api.datacite.org/dois/10.25940/roper-31098406
|
|
|
|
| |
via "missed potential license", refs #58
|
| |
|
|
|
|
|
|
|
| |
Up to now, we expected the description to be a string or list. Add
handling for int as well.
First appeared: Apr 22 19:58:39.
|
|
|
|
|
|
|
| |
It was possible that contribs got added which had no raw name. One
example would be a name consisting of whitespace only.
This fix adds a final check for this case.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Records from https://www.micropublication.org/ did not have a date in
FC, although raw data contained date strings - they were not using the
finer-grained "attributes.date" but "attributes.published" and/or
"attributes.publicationYear".
Support for those fields has been added, including a test case.
During this test (#30) a processing gap for names became clear (author
may have "given_name" and "surname", but no "name"). This bug has been
fixed, too.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Use values from:
* attributes.creators[]
* attributes.contributors[]
|
| |
|
| |
|
| |
|
|
|
|
| |
As [...] we will soon add support for release_month field in the release schema.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
> The convention for display_name and raw_name is to be how the name
would normally be printed, not in index form (surname comma given_name).
So we might need to un-encode names like "Tricart, Pierre".
Use an additional `index_form_to_display_name` function to convert index
from to display form, heuristically.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `test_datacite_conversions` function will compare an input
(datacite) document to an expected output (release entity as JSON). This
way, it should not be too hard to add more cases by adding: input,
output - and by increasing the counter in the range loop within the
test.
To view input and result side by side with vim, change into the test
directory and run:
tests/files/datacite $ ./caseview.sh 18
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add missing langdetect
* use entity_to_dict for json debug output
* factor out code for fields in function and add table driven tests
* update citeproc types
* add author as default role
* add raw_affiliation
* include relations from datacite
* remove url (covered by doi already)
Using yapf for python formatting.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current version succeeded to import a random sample of 100000 records
(0.5%) from datacite.
The --debug (write JSON to stdout) and --insert-log-file (log batch
before committing to db) flags are temporary added to help debugging.
Add few unit tests.
Some edge cases:
a) Existing keys without value requires a slightly awkward:
```
titles = attributes.get('titles', []) or []
```
b) There can be 0, 1, or more (first one wins) titles.
c) Date handling is probably not ideal. Datacite has a potentiall fine
grained list of dates.
The test case (tests/files/datacite_sample.jsonl) refers to
https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main
descriptor) 1986. The datacite record contains: 2017 (publicationYear,
probably the year of record creation with reference system), 1978-06-03
(collected, e.g. experimental sample), 1986 ("Accepted"). The online
version of the resource knows even one more date (2019-06-05 10:14:43 by
WIEWS update).
|
|
* contributors, title, date, publisher, container, license
Field and value analysis via https://github.com/miku/indigo.
|