| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| | |
Estimated time for a single call is in the order of 50ms.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
> The convention for display_name and raw_name is to be how the name
would normally be printed, not in index form (surname comma given_name).
So we might need to un-encode names like "Tricart, Pierre".
Use an additional `index_form_to_display_name` function to convert index
from to display form, heuristically.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The `test_datacite_conversions` function will compare an input
(datacite) document to an expected output (release entity as JSON). This
way, it should not be too hard to add more cases by adding: input,
output - and by increasing the counter in the range loop within the
test.
To view input and result side by side with vim, change into the test
directory and run:
tests/files/datacite $ ./caseview.sh 18
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Example of a non-ascii doi:
* https://doi.org/10.13125/américacrítica/3017
|
| |
| |
| |
| |
| |
| |
| | |
address issue with EN DASH DOI.
> "external identifier doesn't match required pattern for a DOI (expected,
eg, '10.1234/aksjdfh'): 10.25513/1812-3996.2017.1.34–42"
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Should be a level debug, not info.
Examples: E675, n/a, 15D.2.1, 15D.2.1, A.1E.1, A.1E.1, ...
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* add missing langdetect
* use entity_to_dict for json debug output
* factor out code for fields in function and add table driven tests
* update citeproc types
* add author as default role
* add raw_affiliation
* include relations from datacite
* remove url (covered by doi already)
Using yapf for python formatting.
|
| |
| |
| |
| | |
Additionally, try the unspecific (%Y) pattern last.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Current version succeeded to import a random sample of 100000 records
(0.5%) from datacite.
The --debug (write JSON to stdout) and --insert-log-file (log batch
before committing to db) flags are temporary added to help debugging.
Add few unit tests.
Some edge cases:
a) Existing keys without value requires a slightly awkward:
```
titles = attributes.get('titles', []) or []
```
b) There can be 0, 1, or more (first one wins) titles.
c) Date handling is probably not ideal. Datacite has a potentiall fine
grained list of dates.
The test case (tests/files/datacite_sample.jsonl) refers to
https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main
descriptor) 1986. The datacite record contains: 2017 (publicationYear,
probably the year of record creation with reference system), 1978-06-03
(collected, e.g. experimental sample), 1986 ("Accepted"). The online
version of the resource knows even one more date (2019-06-05 10:14:43 by
WIEWS update).
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently using two external libraries:
* dateparser
* langcodes
Note: This commit includes lots of wip docs and field stat in comment,
which should be removed.
|
| |
| |
| |
| |
| |
| | |
* contributors, title, date, publisher, container, license
Field and value analysis via https://github.com/miku/indigo.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
pytest has been pinned to the 4.x series to work around a test import
package mangling problem with citeproc_styles. Now that pytest.ini
explicitly lists test files, this seems to no longer be a problem and
pytest can be updated to the most recent version.
Also re-locked Pipfile.lock with updated dependencies (only minor
changes).
|
|/
|
|
|
|
|
|
|
|
|
| |
The purpose of this change is to test errors when pytest tries to
recursively update assertion statements in all dependent packages. The
reason pytest does this is to add pretty printing, which is nice, but
probably shouldn't be done in all dependency libraries.
This fixes test problems with both CSL (citeproc_styles) and dateparser
(when actually imported in code, which currently on master does not
happen).
|
| |
|
|
|
|
|
|
| |
Produced messages should match:
jq '.data|length' tests/files/datacite_api.json
|
| |
|
|
|
|
|
| |
The bracket syntax is inclusive. See also:
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#_ranges
|
| |
|
|
|
|
|
|
|
|
|
|
| |
As a first iteration, just mark the daily batch complete and continue.
The occasional HTTP 400 issue has been reported as
https://github.com/datacite/datacite/issues/897.
A possible improvement would be to shrink the window, so losses will be
smaller.
|
| |
|
|
|
|
|
|
|
|
|
| |
Update parameter update for datacite API v2. Works fine, but there are
occasional HTTP 400 responses when using the cursor API (daily updates
can exceed the 10000 record limit for search queries).
The HTTP 400 issue is not solved yet, but reported to datacite as
https://github.com/datacite/datacite/issues/897.
|
| |
|
| |
|