| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
|  | 
We are python3.7 now, so this isn't needed.
 | 
| | 
| 
| 
| 
| 
|  | 
These should not have any behavior changes, though a number of exception
catches are now more general, and there may be long-tail exceptions
getting thrown in these statements.
 | 
| |  | 
 | 
| | 
| 
| 
| 
| 
|  | 
The pytest fixture syntax interacts weirdly with flake8 tests, so ignore
the "redefinition" and "unused variable" errors more carefully for .py
files under ./tests/
 | 
| |\  
| | 
| | 
| | 
| |  | 
datacite: address duplicated contributor issue
See merge request webgroup/fatcat!65
 | 
| | |\   | 
 | 
| | | |  | 
 | 
| | | |  | 
 | 
| | | |  | 
 | 
| | | |  | 
 | 
| | | |  | 
 | 
| | | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | |  | 
Use string comparison.
* https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs
* https://api.datacite.org/dois/10.25940/roper-31098406
 | 
| |\ \ \  
| |_|/  
|/| |   
| | |   
| | |    | 
datacite: mitigate sentry #44035
See merge request webgroup/fatcat!66
 | 
| | | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | | 
| | |  | 
According to sentry, running `c.get('nameIdentifiers', []) or []` on a c with value:
```
{'affiliation': [],
 'familyName': 'Guidon',
 'givenName': 'Manuel',
 'nameIdentifiers': {'nameIdentifier': 'https://orcid.org/0000-0003-3543-6683',
                     'nameIdentifierScheme': 'ORCID',
                     'schemeUri': 'https://orcid.org'},
 'nameType': 'Personal'}
```
results in a string, which I cannot reproduce. The document in question at:
https://api.datacite.org/dois/10.26275/kuw1-fdls seems fine, too.
 | 
| |\ \ \  
| | | | 
| | | | 
| | | | 
| | | |  | 
arxiv: address 503, "Retry after specified interval" error
See merge request webgroup/fatcat!64
 | 
| | | | |  | 
 | 
| | |/ /  
|/| |    | 
 | 
| |/ /  
| |   
| |   
| |    | 
refs: #44035
 | 
| | |  | 
 | 
| | |  | 
 | 
| | |  | 
 | 
| | |  | 
 | 
| | |  | 
 | 
| | |  | 
 | 
| | |  | 
 | 
| |/   | 
 | 
| |  | 
 | 
| | 
| 
| 
|  | 
via "missed potential license", refs #58
 | 
| |  | 
 | 
| |  | 
 | 
| |  | 
 | 
| |\  
| | 
| | 
| | 
| |  | 
better download button links
See merge request webgroup/fatcat!57
 | 
| | | 
| | 
| | 
| |  | 
Similar to recent change for release download pages.
 | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| |  | 
This will increase index size (URLs are often long in our corpus, and we
have many file entities), but seems worth it.
Initially added `ia_url` as a second field, guaranteed to always be an
*.archive.org URL, but `best_url` defaults to that anyways so didn't
seem worthwhile.
 | 
| | | 
| | 
| | 
| | 
| | 
| | 
| |  | 
I thought this was the existing behavior, but it looks like we were just
taking the first link from the first file.
In the future may refactor this out even further.
 | 
| |/   | 
 | 
| |\  
| | 
| | 
| | 
| | 
| |  | 
Manually resolved conflicts:
    python/fatcat_tools/harvest/doi_registrars.py
 | 
| | | 
| | 
| | 
| | 
| | 
| | 
| | 
| | 
| |  | 
In the past harvest of datacite resulted in occasional HTTP 400.
Meanwhile, various API bugs have been fixed (most recently:
https://github.com/datacite/lupo/pull/537,
https://github.com/datacite/datacite/issues/1038). Downside of ignoring
this error was that state lives in kafka, which has limited support for
deletion of arbitrary messages from a topic.
 | 
| |\ \  
| | | 
| | | 
| | | 
| | |  | 
harvest: log the failed url
See merge request webgroup/fatcat!55
 | 
| | |/   | 
 | 
| |/   | 
 | 
| |\  
| | 
| | 
| | 
| |  | 
verify release_stage in ingest importer
See merge request webgroup/fatcat!52
 | 
| | |  | 
 | 
| | |  | 
 | 
| |/  
|   
|   
|   
|   
|   
|   
|   
|    | 
"span" short for "timespan" to harvest; there may be a better name to
use.
Motivation for this is to work around a pylint erorr that .next() was
not callable. This might be a bug with pylint, but .next() is also a
very generic name.
 | 
| |  | 
 | 
| |  | 
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Gitlab CI is showing lint errors like:
     =================================== FAILURES ===================================
    6316 _______________________ [pylint] tests/harvest_state.py ________________________
    6317 E: 19,11: hs.next is not callable (not-callable)
    6318 E: 33,11: hs.next is not callable (not-callable)
    6319 E: 19,11: hs.next is not callable (not-callable)
    [...]
this is confusing as we use pipenv with a lock, so I should see the
exact same errors locally.
This commit is a hack to try and fix this and unbreak builds until we
can debug further.
 | 
| | 
| 
| 
| 
|  | 
It seems to be an inadvertantly ugraded version of pylint saying that
these lines are not-callable.
 | 
| |  | 
 |