| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
This goes against what the API docs recommend, but we are currently far
behind on updates and need to catch up. Other than what the docs say,
this seems to be consistent with the behavior we want.
|
|\
| |
| | |
Correct spelling mistakes
|
| | |
|
|\ \
| | |
| | |
| | |
| | | |
catch ApiValueError in some generic API calls
See merge request webgroup/fatcat!35
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The motivation for this change is to handle bogus revision IDs in URLs,
which were causing 500 errors not 400 errors. Eg:
https://qa.fatcat.wiki/file/rev/5d5d5162-b676-4f0a-968f-e19dadeaf96e%2B2019-11-27%2B13:49:51%2B0%2B6
I have no idea where these URLs are actually coming from, but they
should be 4xx not 5xx.
Investigating made me realize there is a whole category of ApiValueError
exceptions we were not catching and should have been.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This tries to show the citeproc (bibtext, MLA, CSL-JSON) options for
more releases, and not show the links when they would break.
The primary motivation here is to work around two exceptions being
thrown in prod every day (according to sentry):
KeyError: 'role'
ValueError: CLS requries some surname (family name)
I'm guessing these are mostly coming from crawlers following the
citeproc links on release landing pages.
|
| |
| |
| |
| |
| |
| |
| | |
This resolves a situation noticed in prod where we were only
importing/updating a single reference per article.
Includes a regression test.
|
|\ \
| |/
|/|
| |
| | |
pubmed and arxiv harvest preparations
See merge request webgroup/fatcat!28
|
| |
| |
| |
| |
| | |
* fetch_date will fail on missing mapping
* adjust tests (test will require access to pubmed ftp)
|
| |
| |
| |
| |
| | |
* regenerate map in continuous mode
* add tests
|
|\ \ |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Includes a trivial test and transform, but not any workers or doc
updates.
|
| | | |
|
| |/
|/| |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Records from https://www.micropublication.org/ did not have a date in
FC, although raw data contained date strings - they were not using the
finer-grained "attributes.date" but "attributes.published" and/or
"attributes.publicationYear".
Support for those fields has been added, including a test case.
During this test (#30) a processing gap for names became clear (author
may have "given_name" and "surname", but no "name"). This bug has been
fixed, too.
|
|
|
|
|
|
|
|
|
| |
Technically, [...] DOI names may incorporate any printable characters
from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the
character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1).
For mostly QA reasons, we currently treat a DOI with an "en dash" as
invalid.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Use values from:
* attributes.creators[]
* attributes.contributors[]
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* attributes.metadataVersion
* attributes.schemaVersion
* attributes.version (source dependent values, follows suggestions in
https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf#page=26,
but values vary)
Furthermore:
* attributes.types.resourceTypeGeneral
* attributes.types.resourceType
|
| |
|
|
|
|
|
| |
> include release_month as a top-level extra field [...] to
auto-populate the schema field from that
|
| |
|
|
|
|
|
|
|
|
| |
Datacite defines placeholders for unknown values:
* https://support.datacite.org/docs/schema-values-unknown-information-v43
Clean abstracts.
|
|
|
|
|
|
| |
> always include extra values for the respective DOI registrars
(datacite, crossref, jalc), even if they are empty ({}), to be used as a
flag so we know which DOI registrar supplied the metadata.
|
| |
|
|
|
|
| |
As [...] we will soon add support for release_month field in the release schema.
|
| |
|
|
|
|
| |
Estimated time for a single call is in the order of 50ms.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
> The convention for display_name and raw_name is to be how the name
would normally be printed, not in index form (surname comma given_name).
So we might need to un-encode names like "Tricart, Pierre".
Use an additional `index_form_to_display_name` function to convert index
from to display form, heuristically.
|