| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Many of these 'subject' objects have the equivalent of several lines of
text, with complex URLs that don't compress well. I think it is fine we
have included these thus far instead of parsing more deeply, but going
forward I don't think this nested 'extra' metadata is worth the database
space.
|
|
|
|
|
|
|
|
| |
This was used during initial bulk imports, but is no longer used and
could create serious metadata problems if used accidentially.
In retrospect, it also made metadata provenance less transparent, and
may have done more harm than good overall.
|
| |
|
| |
|
|
|
|
| |
Perhaps this was a change when upgrading 'dateparser'?
|
| |
|
|
|
|
|
|
|
| |
These mostly add new variable names so that existing variables aren't
overwritten with a new type; delay coercing '{}' or '[]' to 'None' until
the last minute; adding is-not-None checks to conditional clauses; and
similar small changes.
|
|
|
|
|
| |
This commit just adds the type annotations, doesn't do fixes to code to
make type checking pass.
|
| |
|
| |
|
|
|
|
| |
'==' vs 'is'; 'not a in b' vs 'a not in b'; etc
|
|
|
|
|
| |
Do not add abstracts where `clean` results in the empty string - this
violates a constraint: `either abstract_sha1 or content is required`
|
|
|
|
|
| |
Caused by a partial "title entry without title" coming *first* (e.g. just
holding, e.g. a language, like: {'lang': 'da'}
|
|
|
|
| |
refs sentry #77700
|
|
|
|
|
| |
Easy to miss that we skip updates *twice*, and with this early bailout
were not updating counts correctly.
|
| |
|
|
|
|
|
|
|
|
|
| |
seemingly from zenodo:
* https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi
* https://doi.org/10.5281/zenodo.4041777
About 3400 records with "FULL MOVIE" in title, currently.
|
|
|
|
|
| |
Includes a tiny tweak to the datacite import sample file to test this
code path.
|
| |
|
| |
|
|
|
|
| |
Also tweak title/publisher detection to use DOI prefixes
|
|
|
|
| |
We are python3.7 now, so this isn't needed.
|
|
|
|
|
|
| |
These should not have any behavior changes, though a number of exception
catches are now more general, and there may be long-tail exceptions
getting thrown in these statements.
|
|\
| |
| |
| |
| | |
datacite: address duplicated contributor issue
See merge request webgroup/fatcat!65
|
| |\ |
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use string comparison.
* https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs
* https://api.datacite.org/dois/10.25940/roper-31098406
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
According to sentry, running `c.get('nameIdentifiers', []) or []` on a c with value:
```
{'affiliation': [],
'familyName': 'Guidon',
'givenName': 'Manuel',
'nameIdentifiers': {'nameIdentifier': 'https://orcid.org/0000-0003-3543-6683',
'nameIdentifierScheme': 'ORCID',
'schemeUri': 'https://orcid.org'},
'nameType': 'Personal'}
```
results in a string, which I cannot reproduce. The document in question at:
https://api.datacite.org/dois/10.26275/kuw1-fdls seems fine, too.
|
| |
| |
| |
| | |
refs: #44035
|
|/ |
|
| |
|
|
|
|
| |
via "missed potential license", refs #58
|
| |
|
|
|
|
|
|
|
| |
Up to now, we expected the description to be a string or list. Add
handling for int as well.
First appeared: Apr 22 19:58:39.
|
|
|
|
|
|
|
| |
It was possible that contribs got added which had no raw name. One
example would be a name consisting of whitespace only.
This fix adds a final check for this case.
|
|\
| |
| | |
Correct spelling mistakes
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Works around a bug in production:
AttributeError: 'NoneType' object has no attribute 'replace'
(datacite.py:724)
NOTE: there are no tests for this code path
|
|/
|
|
|
|
|
|
|
| |
Example of entities with bogus years:
https://fatcat.wiki/release/search?q=doi_registrar%3Adatacite+year%3A%3E2100
We can do a clean-up task, but first need to prevent creation of new bad
metadata.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Records from https://www.micropublication.org/ did not have a date in
FC, although raw data contained date strings - they were not using the
finer-grained "attributes.date" but "attributes.published" and/or
"attributes.publicationYear".
Support for those fields has been added, including a test case.
During this test (#30) a processing gap for names became clear (author
may have "given_name" and "surname", but no "name"). This bug has been
fixed, too.
|
| |
|
| |
|
| |
|