Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | datacite: add comment about potential date parsing bug | Bryan Newbold | 2021-11-03 | 1 | -0/+1 |
| | |||||
* | datacite importer: dateparser.date.DateDataParser() | Bryan Newbold | 2021-11-03 | 1 | -1/+1 |
| | | | | Perhaps this was a change when upgrading 'dateparser'? | ||||
* | more involved type wrangling and fixes for importers | Bryan Newbold | 2021-11-03 | 1 | -2/+3 |
| | |||||
* | typing: relatively simple type check fixes | Bryan Newbold | 2021-11-03 | 1 | -8/+10 |
| | | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes. | ||||
* | typing: initial annotations on importers | Bryan Newbold | 2021-11-03 | 1 | -30/+59 |
| | | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass. | ||||
* | fmt (black): fatcat_tools/ | Bryan Newbold | 2021-11-02 | 1 | -380/+444 |
| | |||||
* | python: isort everything | Bryan Newbold | 2021-11-02 | 1 | -1/+1 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 1 | -2/+2 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | datacite: skip empty abstracts | Martin Czygan | 2021-10-01 | 1 | -1/+4 |
| | | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required` | ||||
* | datacite: more careful title string access; fixes sentry #88350 | Martin Czygan | 2021-06-11 | 1 | -1/+1 |
| | | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'} | ||||
* | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 1 | -2/+1 |
| | | | | refs sentry #77700 | ||||
* | crossref+datacite: remove confusing early update bail | Bryan Newbold | 2020-11-20 | 1 | -2/+0 |
| | | | | | Easy to miss that we skip updates *twice*, and with this early bailout were not updating counts correctly. | ||||
* | refactor: white/black -> allow/block | Bryan Newbold | 2020-11-05 | 1 | -4/+4 |
| | |||||
* | address spammy datacite titles | Martin Czygan | 2020-09-23 | 1 | -0/+19 |
| | | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently. | ||||
* | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 1 | -1/+1 |
| | | | | | Includes a tiny tweak to the datacite import sample file to test this code path. | ||||
* | datacite import: figshare-specific hacks | Bryan Newbold | 2020-08-11 | 1 | -3/+3 |
| | |||||
* | datacite import: refactor release_type detection into static method | Bryan Newbold | 2020-08-11 | 1 | -14/+51 |
| | |||||
* | datacite import: refactor publisher-specific hacks into static method | Bryan Newbold | 2020-08-11 | 1 | -15/+29 |
| | | | | Also tweak title/publisher detection to use DOI prefixes | ||||
* | remove isascii() work around definition in importers/datacite.py | Bryan Newbold | 2020-07-23 | 1 | -7/+1 |
| | | | | We are python3.7 now, so this isn't needed. | ||||
* | simple lint (flake8) fixes over python codebase | Bryan Newbold | 2020-07-23 | 1 | -7/+7 |
| | | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements. | ||||
* | Merge branch 'martin-datacite-duplicated-author-gh-59' into 'master' | bnewbold | 2020-07-11 | 1 | -6/+60 |
|\ | | | | | | | | | datacite: address duplicated contributor issue See merge request webgroup/fatcat!65 | ||||
| * | datacite: resolve formatting issues in tests | Martin Czygan | 2020-07-10 | 1 | -2/+1 |
| |\ | |||||
| * | | datacite: there should be no index gaps | Martin Czygan | 2020-07-10 | 1 | -2/+8 |
| | | | |||||
| * | | datacite: document contributor types | Martin Czygan | 2020-07-10 | 1 | -0/+25 |
| | | | |||||
| * | | wip: contrib, GH59 | Martin Czygan | 2020-07-10 | 1 | -16/+22 |
| | | | |||||
| * | | datacite: address duplicated contributor issue | Martin Czygan | 2020-07-07 | 1 | -0/+16 |
| | | | | | | | | | | | | | | | | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406 | ||||
* | | | datacite: mitigate sentry #44035 | Martin Czygan | 2020-07-10 | 1 | -0/+4 |
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to sentry, running `c.get('nameIdentifiers', []) or []` on a c with value: ``` {'affiliation': [], 'familyName': 'Guidon', 'givenName': 'Manuel', 'nameIdentifiers': {'nameIdentifier': 'https://orcid.org/0000-0003-3543-6683', 'nameIdentifierScheme': 'ORCID', 'schemeUri': 'https://orcid.org'}, 'nameType': 'Personal'} ``` results in a string, which I cannot reproduce. The document in question at: https://api.datacite.org/dois/10.26275/kuw1-fdls seems fine, too. | ||||
* | | datacite: fix attribute error | Martin Czygan | 2020-07-07 | 1 | -1/+1 |
| | | | | | | | | refs: #44035 | ||||
* | | lint (flake8) tool python files | Bryan Newbold | 2020-07-01 | 1 | -2/+0 |
|/ | |||||
* | add new license mappings | Bryan Newbold | 2020-06-30 | 1 | -0/+14 |
| | |||||
* | datacite: improve license mapping | Martin Czygan | 2020-06-30 | 1 | -9/+15 |
| | | | | via "missed potential license", refs #58 | ||||
* | datacite: hard cast possible date value to string | Martin Czygan | 2020-06-29 | 1 | -1/+1 |
| | |||||
* | datacite: fix type error | Martin Czygan | 2020-04-22 | 1 | -1/+3 |
| | | | | | | | Up to now, we expected the description to be a string or list. Add handling for int as well. First appeared: Apr 22 19:58:39. | ||||
* | datacite: fix a raw name constraint violation | Martin Czygan | 2020-04-20 | 1 | -0/+8 |
| | | | | | | | It was possible that contribs got added which had no raw name. One example would be a name consisting of whitespace only. This fix adds a final check for this case. | ||||
* | Merge pull request #53 from EdwardBetts/spelling | bnewbold | 2020-03-27 | 1 | -4/+4 |
|\ | | | | | Correct spelling mistakes | ||||
| * | Correct spelling mistakes | Edward Betts | 2020-03-27 | 1 | -4/+4 |
| | | |||||
* | | datacite: nameIdentifier corner case | Bryan Newbold | 2020-03-26 | 1 | -1/+2 |
| | | | | | | | | | | | | | | | | | | Works around a bug in production: AttributeError: 'NoneType' object has no attribute 'replace' (datacite.py:724) NOTE: there are no tests for this code path | ||||
* | | datacite: add year sanity restrictions | bnewbold | 2020-03-23 | 1 | -0/+7 |
|/ | | | | | | | | | Example of entities with bogus years: https://fatcat.wiki/release/search?q=doi_registrar%3Adatacite+year%3A%3E2100 We can do a clean-up task, but first need to prevent creation of new bad metadata. | ||||
* | datacite: prevent none | Martin Czygan | 2020-01-31 | 1 | -1/+1 |
| | |||||
* | datacite: name shall not be None | Martin Czygan | 2020-01-31 | 1 | -1/+1 |
| | |||||
* | datacite: add exception for https://www.micropublication.org/ | Martin Czygan | 2020-01-31 | 1 | -0/+5 |
| | |||||
* | datacite: do not skip records w/o date | Martin Czygan | 2020-01-31 | 1 | -2/+1 |
| | |||||
* | datacite: improve docstring | Martin Czygan | 2020-01-31 | 1 | -4/+4 |
| | |||||
* | datacite: improve date handling and minor tweak | Martin Czygan | 2020-01-30 | 1 | -19/+42 |
| | | | | | | | | | | | | | Records from https://www.micropublication.org/ did not have a date in FC, although raw data contained date strings - they were not using the finer-grained "attributes.date" but "attributes.published" and/or "attributes.publicationYear". Support for those fields has been added, including a test case. During this test (#30) a processing gap for names became clear (author may have "given_name" and "surname", but no "name"). This bug has been fixed, too. | ||||
* | datacite: skip records without a doi | Martin Czygan | 2020-01-13 | 1 | -0/+4 |
| | |||||
* | datacite: add entry to license slug map | Martin Czygan | 2020-01-09 | 1 | -0/+1 |
| | |||||
* | datacite: ignore known unknown values in resourceType* | Martin Czygan | 2020-01-09 | 1 | -2/+2 |
| | |||||
* | datacite: abstracts may be strings or list of strings | Martin Czygan | 2020-01-09 | 1 | -2/+15 |
| | |||||
* | datacite: improve license_slug handling | Martin Czygan | 2020-01-09 | 1 | -60/+101 |
| | |||||
* | datacite: add 'Unknown' to blacklist | Martin Czygan | 2020-01-09 | 1 | -1/+5 |
| |