| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
|
| |
| |
| |
| |
| |
| | |
Should be a level debug, not info.
Examples: E675, n/a, 15D.2.1, 15D.2.1, A.1E.1, A.1E.1, ...
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* add missing langdetect
* use entity_to_dict for json debug output
* factor out code for fields in function and add table driven tests
* update citeproc types
* add author as default role
* add raw_affiliation
* include relations from datacite
* remove url (covered by doi already)
Using yapf for python formatting.
|
| |
| |
| |
| | |
Additionally, try the unspecific (%Y) pattern last.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Current version succeeded to import a random sample of 100000 records
(0.5%) from datacite.
The --debug (write JSON to stdout) and --insert-log-file (log batch
before committing to db) flags are temporary added to help debugging.
Add few unit tests.
Some edge cases:
a) Existing keys without value requires a slightly awkward:
```
titles = attributes.get('titles', []) or []
```
b) There can be 0, 1, or more (first one wins) titles.
c) Date handling is probably not ideal. Datacite has a potentiall fine
grained list of dates.
The test case (tests/files/datacite_sample.jsonl) refers to
https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main
descriptor) 1986. The datacite record contains: 2017 (publicationYear,
probably the year of record creation with reference system), 1978-06-03
(collected, e.g. experimental sample), 1986 ("Accepted"). The online
version of the resource knows even one more date (2019-06-05 10:14:43 by
WIEWS update).
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently using two external libraries:
* dateparser
* langcodes
Note: This commit includes lots of wip docs and field stat in comment,
which should be removed.
|
| |
| |
| |
| |
| |
| | |
* contributors, title, date, publisher, container, license
Field and value analysis via https://github.com/miku/indigo.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
pytest has been pinned to the 4.x series to work around a test import
package mangling problem with citeproc_styles. Now that pytest.ini
explicitly lists test files, this seems to no longer be a problem and
pytest can be updated to the most recent version.
Also re-locked Pipfile.lock with updated dependencies (only minor
changes).
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The purpose of this change is to test errors when pytest tries to
recursively update assertion statements in all dependent packages. The
reason pytest does this is to add pretty printing, which is nice, but
probably shouldn't be done in all dependency libraries.
This fixes test problems with both CSL (citeproc_styles) and dateparser
(when actually imported in code, which currently on master does not
happen).
|
| | |
|
| | |
|
|\ \
| |/
|/|
| |
| | |
remove duplicate fields in entity release
See merge request webgroup/fatcat!11
|
| | |
|
| | |
|
| | |
|
| | |
|
|\ \
| | |
| | |
| | |
| | | |
Datacite daily harvest
See merge request webgroup/fatcat!6
|
| | |
| | |
| | |
| | |
| | |
| | | |
Produced messages should match:
jq '.data|length' tests/files/datacite_api.json
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
The bracket syntax is inclusive. See also:
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#_ranges
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As a first iteration, just mark the daily batch complete and continue.
The occasional HTTP 400 issue has been reported as
https://github.com/datacite/datacite/issues/897.
A possible improvement would be to shrink the window, so losses will be
smaller.
|
| | | |
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| | |
Update parameter update for datacite API v2. Works fine, but there are
occasional HTTP 400 responses when using the cursor API (daily updates
can exceed the 10000 record limit for search queries).
The HTTP 400 issue is not solved yet, but reported to datacite as
https://github.com/datacite/datacite/issues/897.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Replace emdash with regular dash.
Replace double slash after partner ID with single slash. This conversion
seems to be done by crossref automatically on lookup. I tried several
examples, using doi.org resolver and Crossref API lookup.
Note that there are a number of fatcat entities with '//' in the DOI.
|
| | |
|
| | |
|