Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
| * | datacite: names can be 'Unav', too | Martin Czygan | 2020-01-02 | 1 | -1/+4 | |
| | | ||||||
| * | datacite: avoid more None values | Martin Czygan | 2020-01-01 | 1 | -4/+4 | |
| | | ||||||
| * | datacite: address 'Unpublished' publisher | Martin Czygan | 2019-12-31 | 1 | -9/+10 | |
| | | ||||||
| * | datacite: ensure name schema is defined | Martin Czygan | 2019-12-31 | 1 | -1/+2 | |
| | | ||||||
| * | datacite: fix typo | Martin Czygan | 2019-12-31 | 1 | -1/+1 | |
| | | ||||||
| * | datacite: isascii was added in 3.7, only | Martin Czygan | 2019-12-31 | 1 | -1/+7 | |
| | | ||||||
| * | datacite: skip non-ascii doi for now | Martin Czygan | 2019-12-31 | 1 | -0/+4 | |
| | | | | | | | | | | | | Example of a non-ascii doi: * https://doi.org/10.13125/américacrítica/3017 | |||||
| * | datacite: clean doi | Martin Czygan | 2019-12-31 | 1 | -1/+13 | |
| | | | | | | | | | | | | | | address issue with EN DASH DOI. > "external identifier doesn't match required pattern for a DOI (expected, eg, '10.1234/aksjdfh'): 10.25513/1812-3996.2017.1.34–42" | |||||
| * | datacite: update docs | Martin Czygan | 2019-12-31 | 1 | -9/+9 | |
| | | ||||||
| * | datacite: perform additional checks on contrib | Martin Czygan | 2019-12-30 | 1 | -3/+9 | |
| | | ||||||
| * | datacite: check for empty title after clean | Martin Czygan | 2019-12-29 | 1 | -2/+5 | |
| | | ||||||
| * | datacite: update docs with observed values | Martin Czygan | 2019-12-29 | 1 | -1/+3 | |
| | | ||||||
| * | datacite: page number misses are too common | Martin Czygan | 2019-12-28 | 1 | -1/+2 | |
| | | | | | | | | | | | | Should be a level debug, not info. Examples: E675, n/a, 15D.2.1, 15D.2.1, A.1E.1, A.1E.1, ... | |||||
| * | datacite: suppress debug-like language lookup miss message | Martin Czygan | 2019-12-28 | 1 | -1/+3 | |
| | | ||||||
| * | datacite: adjust tests | Martin Czygan | 2019-12-28 | 1 | -2/+1 | |
| | | ||||||
| * | datacite: treat untyped names as people | Martin Czygan | 2019-12-28 | 1 | -1/+1 | |
| | | ||||||
| * | datacite: include container_name top level key in extra | Martin Czygan | 2019-12-28 | 1 | -7/+21 | |
| | | ||||||
| * | datacite: use clean on field values | Martin Czygan | 2019-12-28 | 1 | -2/+28 | |
| | | ||||||
| * | datacite: include doi in error messages | Martin Czygan | 2019-12-28 | 1 | -8/+8 | |
| | | ||||||
| * | remove langcodes dependency | Martin Czygan | 2019-12-28 | 2 | -15/+0 | |
| | | ||||||
| * | datacite: limit abstract length | Martin Czygan | 2019-12-28 | 1 | -0/+6 | |
| | | ||||||
| * | datacite: use iso 639-1 codes | Martin Czygan | 2019-12-28 | 1 | -7/+4 | |
| | | ||||||
| * | datacite: use specific auth var | Martin Czygan | 2019-12-28 | 1 | -1/+1 | |
| | | ||||||
| * | datacite: add missing --extid-map-file flag | Martin Czygan | 2019-12-28 | 1 | -0/+4 | |
| | | ||||||
| * | address first round of MR14 comments | Martin Czygan | 2019-12-28 | 4 | -150/+503 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | * add missing langdetect * use entity_to_dict for json debug output * factor out code for fields in function and add table driven tests * update citeproc types * add author as default role * add raw_affiliation * include relations from datacite * remove url (covered by doi already) Using yapf for python formatting. | |||||
| * | datacite: move common date patterns out of the loop | Martin Czygan | 2019-12-28 | 1 | -3/+4 | |
| | | | | | | | | Additionally, try the unspecific (%Y) pattern last. | |||||
| * | improve datacite field mapping and import | Martin Czygan | 2019-12-28 | 5 | -59/+245 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current version succeeded to import a random sample of 100000 records (0.5%) from datacite. The --debug (write JSON to stdout) and --insert-log-file (log batch before committing to db) flags are temporary added to help debugging. Add few unit tests. Some edge cases: a) Existing keys without value requires a slightly awkward: ``` titles = attributes.get('titles', []) or [] ``` b) There can be 0, 1, or more (first one wins) titles. c) Date handling is probably not ideal. Datacite has a potentiall fine grained list of dates. The test case (tests/files/datacite_sample.jsonl) refers to https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main descriptor) 1986. The datacite record contains: 2017 (publicationYear, probably the year of record creation with reference system), 1978-06-03 (collected, e.g. experimental sample), 1986 ("Accepted"). The online version of the resource knows even one more date (2019-06-05 10:14:43 by WIEWS update). | |||||
| * | datacite: add missing mappings and notes | Martin Czygan | 2019-12-28 | 1 | -266/+175 | |
| | | ||||||
| * | datacite: basic field mappings | Martin Czygan | 2019-12-28 | 1 | -41/+181 | |
| | | | | | | | | | | | | | | | | | | | | Currently using two external libraries: * dateparser * langcodes Note: This commit includes lots of wip docs and field stat in comment, which should be removed. | |||||
| * | datacite: importer skeleton | Martin Czygan | 2019-12-28 | 4 | -0/+514 | |
| | | | | | | | | | | | | * contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo. | |||||
* | | 2019-01-07 status update | Bryan Newbold | 2020-01-07 | 2 | -0/+36 | |
| | | ||||||
* | | chocula bulk edit note | Bryan Newbold | 2020-01-07 | 2 | -0/+15 | |
| | | ||||||
* | | importers: control update behavior with more-standard flag | Bryan Newbold | 2020-01-06 | 6 | -3/+15 | |
| | | ||||||
* | | proposals: standardize a bit | Bryan Newbold | 2020-01-03 | 9 | -3/+34 | |
| | | ||||||
* | | notes on search query parsing (WIP) | Bryan Newbold | 2020-01-03 | 1 | -0/+22 | |
| | | ||||||
* | | fatcat identifiers proposal (WIP) | Bryan Newbold | 2020-01-03 | 1 | -0/+25 | |
| | | ||||||
* | | proposal: python3.7 upgrade | Bryan Newbold | 2020-01-03 | 1 | -0/+101 | |
| | | ||||||
* | | pipenv: update pytest to 5.x; remove langcodes | Bryan Newbold | 2020-01-03 | 2 | -108/+85 | |
| | | | | | | | | | | | | | | | | | | | | pytest has been pinned to the 4.x series to work around a test import package mangling problem with citeproc_styles. Now that pytest.ini explicitly lists test files, this seems to no longer be a problem and pytest can be updated to the most recent version. Also re-locked Pipfile.lock with updated dependencies (only minor changes). | |||||
* | | pytest: explicitly indicate all in-scope test files | Bryan Newbold | 2020-01-03 | 1 | -3/+1 | |
| | | | | | | | | | | | | | | | | | | | | | | The purpose of this change is to test errors when pytest tries to recursively update assertion statements in all dependent packages. The reason pytest does this is to add pretty printing, which is nice, but probably shouldn't be done in all dependency libraries. This fixes test problems with both CSL (citeproc_styles) and dateparser (when actually imported in code, which currently on master does not happen). | |||||
* | | scholix schema links/proposal | Bryan Newbold | 2020-01-03 | 1 | -0/+3 | |
| | | ||||||
* | | update bulk edit CHANGELOG and orcid notes | Bryan Newbold | 2019-12-31 | 2 | -13/+49 | |
| | | ||||||
* | | Merge branch 'martin-guide-entity-release-fix' into 'master' | bnewbold | 2019-12-31 | 1 | -5/+5 | |
|\ \ | |/ |/| | | | | | remove duplicate fields in entity release See merge request webgroup/fatcat!11 | |||||
| * | document year and date of withdrawn release | Martin Czygan | 2019-12-17 | 1 | -1/+5 | |
| | | ||||||
| * | remove duplicate fields in entity release | Martin Czygan | 2019-12-17 | 1 | -4/+0 | |
| | | ||||||
* | | bulk edit updates | Bryan Newbold | 2019-12-26 | 1 | -3/+4 | |
| | | ||||||
* | | orcid: skip non-person ORCID records | Bryan Newbold | 2019-12-26 | 1 | -0/+4 | |
| | | ||||||
* | | Merge branch 'martin-datacite-daily-harvest' into 'master' | Martin Czygan | 2019-12-26 | 3 | -5/+73 | |
|\ \ | | | | | | | | | | | | | Datacite daily harvest See merge request webgroup/fatcat!6 | |||||
| * | | datacite: fix harvest test | Martin Czygan | 2019-12-27 | 1 | -1/+1 | |
| | | | | | | | | | | | | | | | | | | Produced messages should match: jq '.data|length' tests/files/datacite_api.json | |||||
| * | | datacite: add simple test and fixture for datacite api interaction | Martin Czygan | 2019-12-27 | 2 | -0/+46 | |
| | | | ||||||
| * | | datacite: extend range search query | Martin Czygan | 2019-12-27 | 1 | -1/+1 | |
| | | | | | | | | | | | | | | | The bracket syntax is inclusive. See also: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#_ranges |