| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| | |
verify release_stage in ingest importer
See merge request webgroup/fatcat!52
|
| | |
|
|/
|
|
|
|
|
|
|
| |
"span" short for "timespan" to harvest; there may be a better name to
use.
Motivation for this is to work around a pylint erorr that .next() was
not callable. This might be a bug with pylint, but .next() is also a
very generic name.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gitlab CI is showing lint errors like:
=================================== FAILURES ===================================
6316 _______________________ [pylint] tests/harvest_state.py ________________________
6317 E: 19,11: hs.next is not callable (not-callable)
6318 E: 33,11: hs.next is not callable (not-callable)
6319 E: 19,11: hs.next is not callable (not-callable)
[...]
this is confusing as we use pipenv with a lock, so I should see the
exact same errors locally.
This commit is a hack to try and fix this and unbreak builds until we
can debug further.
|
|\ |
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Up to now, we expected the description to be a string or list. Add
handling for int as well.
First appeared: Apr 22 19:58:39.
|
|/
|
|
|
|
|
| |
It was possible that contribs got added which had no raw name. One
example would be a name consisting of whitespace only.
This fix adds a final check for this case.
|
|
|
|
|
|
| |
This goes against what the API docs recommend, but we are currently far
behind on updates and need to catch up. Other than what the docs say,
this seems to be consistent with the behavior we want.
|
|\
| |
| | |
Correct spelling mistakes
|
| | |
|
|\ \
| | |
| | |
| | |
| | | |
catch ApiValueError in some generic API calls
See merge request webgroup/fatcat!35
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The motivation for this change is to handle bogus revision IDs in URLs,
which were causing 500 errors not 400 errors. Eg:
https://qa.fatcat.wiki/file/rev/5d5d5162-b676-4f0a-968f-e19dadeaf96e%2B2019-11-27%2B13:49:51%2B0%2B6
I have no idea where these URLs are actually coming from, but they
should be 4xx not 5xx.
Investigating made me realize there is a whole category of ApiValueError
exceptions we were not catching and should have been.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This tries to show the citeproc (bibtext, MLA, CSL-JSON) options for
more releases, and not show the links when they would break.
The primary motivation here is to work around two exceptions being
thrown in prod every day (according to sentry):
KeyError: 'role'
ValueError: CLS requries some surname (family name)
I'm guessing these are mostly coming from crawlers following the
citeproc links on release landing pages.
|
| |
| |
| |
| |
| |
| |
| | |
This resolves a situation noticed in prod where we were only
importing/updating a single reference per article.
Includes a regression test.
|
|\ \
| |/
|/|
| |
| | |
pubmed and arxiv harvest preparations
See merge request webgroup/fatcat!28
|
| |
| |
| |
| |
| | |
* fetch_date will fail on missing mapping
* adjust tests (test will require access to pubmed ftp)
|
| |
| |
| |
| |
| | |
* regenerate map in continuous mode
* add tests
|
|\ \ |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Includes a trivial test and transform, but not any workers or doc
updates.
|
| | | |
|
| |/
|/| |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Records from https://www.micropublication.org/ did not have a date in
FC, although raw data contained date strings - they were not using the
finer-grained "attributes.date" but "attributes.published" and/or
"attributes.publicationYear".
Support for those fields has been added, including a test case.
During this test (#30) a processing gap for names became clear (author
may have "given_name" and "surname", but no "name"). This bug has been
fixed, too.
|
|
|
|
|
|
|
|
|
| |
Technically, [...] DOI names may incorporate any printable characters
from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the
character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1).
For mostly QA reasons, we currently treat a DOI with an "en dash" as
invalid.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Use values from:
* attributes.creators[]
* attributes.contributors[]
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* attributes.metadataVersion
* attributes.schemaVersion
* attributes.version (source dependent values, follows suggestions in
https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf#page=26,
but values vary)
Furthermore:
* attributes.types.resourceTypeGeneral
* attributes.types.resourceType
|
| |
|
|
|
|
|
| |
> include release_month as a top-level extra field [...] to
auto-populate the schema field from that
|
| |
|
|
|
|
|
|
|
|
| |
Datacite defines placeholders for unknown values:
* https://support.datacite.org/docs/schema-values-unknown-information-v43
Clean abstracts.
|
|
|
|
|
|
| |
> always include extra values for the respective DOI registrars
(datacite, crossref, jalc), even if they are empty ({}), to be used as a
flag so we know which DOI registrar supplied the metadata.
|
| |
|