| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Based on fatcat-scholar refactoring.
This doesn't include refactoring of stats, aggregates, or histograms
yet, just the direct queries.
Don't have any test coverage yet; intend to try elasticmock or figuring
out how to ingest mock JSON results directly.
|
|\
| |
| |
| |
| | |
more lint fixes
See merge request webgroup/fatcat!69
|
| | |
|
| |
| |
| |
| |
| |
| | |
The pytest fixture syntax interacts weirdly with flake8 tests, so ignore
the "redefinition" and "unused variable" errors more carefully for .py
files under ./tests/
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Frequently when looking at preservation coverage of journals, the
current year shows as "un-preserved" when in fact there is robust KBART
(keepers, eg CLOCKSS/Portico) coverage. This is partially because we
don't update containers with KBART year spans very frequently (which is
on us), and partially because KBART reports are often a bit out of day
(eg, doesn't show coverage for the current year. For that matter, they
probably take a few months to update the previous year as well, but that
is a larger time span to fudge over.
This patch means we will count Portico/LOCKSS/etc coverage for "last
year" to count as coverage of publications dated "this year". Note that
for this to be effective/correct, it is assumed that we will update
containers with coverage year spans at least once a year, and that we
will re-index all releases at least once a year.
|
|\ |
|
| | |
|
| | |
|
| | |
|
| | |
|
|/
|
|
|
|
|
| |
Use string comparison.
* https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs
* https://api.datacite.org/dois/10.25940/roper-31098406
|
|
|
|
| |
via "missed potential license", refs #58
|
| |
|
| |
|
|\
| |
| |
| |
| | |
verify release_stage in ingest importer
See merge request webgroup/fatcat!52
|
| | |
|
|/
|
|
|
|
|
|
|
| |
"span" short for "timespan" to harvest; there may be a better name to
use.
Motivation for this is to work around a pylint erorr that .next() was
not callable. This might be a bug with pylint, but .next() is also a
very generic name.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gitlab CI is showing lint errors like:
=================================== FAILURES ===================================
6316 _______________________ [pylint] tests/harvest_state.py ________________________
6317 E: 19,11: hs.next is not callable (not-callable)
6318 E: 33,11: hs.next is not callable (not-callable)
6319 E: 19,11: hs.next is not callable (not-callable)
[...]
this is confusing as we use pipenv with a lock, so I should see the
exact same errors locally.
This commit is a hack to try and fix this and unbreak builds until we
can debug further.
|
|\ |
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Up to now, we expected the description to be a string or list. Add
handling for int as well.
First appeared: Apr 22 19:58:39.
|
|/
|
|
|
|
|
| |
It was possible that contribs got added which had no raw name. One
example would be a name consisting of whitespace only.
This fix adds a final check for this case.
|
|
|
|
|
|
| |
This goes against what the API docs recommend, but we are currently far
behind on updates and need to catch up. Other than what the docs say,
this seems to be consistent with the behavior we want.
|
|\
| |
| | |
Correct spelling mistakes
|
| | |
|
|\ \
| | |
| | |
| | |
| | | |
catch ApiValueError in some generic API calls
See merge request webgroup/fatcat!35
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The motivation for this change is to handle bogus revision IDs in URLs,
which were causing 500 errors not 400 errors. Eg:
https://qa.fatcat.wiki/file/rev/5d5d5162-b676-4f0a-968f-e19dadeaf96e%2B2019-11-27%2B13:49:51%2B0%2B6
I have no idea where these URLs are actually coming from, but they
should be 4xx not 5xx.
Investigating made me realize there is a whole category of ApiValueError
exceptions we were not catching and should have been.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This tries to show the citeproc (bibtext, MLA, CSL-JSON) options for
more releases, and not show the links when they would break.
The primary motivation here is to work around two exceptions being
thrown in prod every day (according to sentry):
KeyError: 'role'
ValueError: CLS requries some surname (family name)
I'm guessing these are mostly coming from crawlers following the
citeproc links on release landing pages.
|
| |
| |
| |
| |
| |
| |
| | |
This resolves a situation noticed in prod where we were only
importing/updating a single reference per article.
Includes a regression test.
|
|\ \
| |/
|/|
| |
| | |
pubmed and arxiv harvest preparations
See merge request webgroup/fatcat!28
|
| |
| |
| |
| |
| | |
* fetch_date will fail on missing mapping
* adjust tests (test will require access to pubmed ftp)
|
| |
| |
| |
| |
| | |
* regenerate map in continuous mode
* add tests
|
|\ \ |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Includes a trivial test and transform, but not any workers or doc
updates.
|
| | | |
|
| |/
|/| |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Records from https://www.micropublication.org/ did not have a date in
FC, although raw data contained date strings - they were not using the
finer-grained "attributes.date" but "attributes.published" and/or
"attributes.publicationYear".
Support for those fields has been added, including a test case.
During this test (#30) a processing gap for names became clear (author
may have "given_name" and "surname", but no "name"). This bug has been
fixed, too.
|
|
|
|
|
|
|
|
|
| |
Technically, [...] DOI names may incorporate any printable characters
from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the
character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1).
For mostly QA reasons, we currently treat a DOI with an "en dash" as
invalid.
|
| |
|
| |
|
| |
|
| |
|
| |
|