aboutsummaryrefslogtreecommitdiffstats
path: root/python/tests
Commit message (Collapse)AuthorAgeFilesLines
...
* refactor release and container searchBryan Newbold2020-07-241-5/+2
| | | | | | | | | | Based on fatcat-scholar refactoring. This doesn't include refactoring of stats, aggregates, or histograms yet, just the direct queries. Don't have any test coverage yet; intend to try elasticmock or figuring out how to ingest mock JSON results directly.
* Merge branch 'bnewbold-more-lint-fixes' into 'master'Martin Czygan2020-07-245-6/+5
|\ | | | | | | | | more lint fixes See merge request webgroup/fatcat!69
| * fix actual typo in tests (caught by lint)Bryan Newbold2020-07-231-2/+2
| |
| * simple lint (flake8) fixes in testsBryan Newbold2020-07-234-4/+3
| | | | | | | | | | | | The pytest fixture syntax interacts weirdly with flake8 tests, so ignore the "redefinition" and "unused variable" errors more carefully for .py files under ./tests/
* | make in_kbart transform inclusive of last yearBryan Newbold2020-07-231-0/+46
|/ | | | | | | | | | | | | | | | | Frequently when looking at preservation coverage of journals, the current year shows as "un-preserved" when in fact there is robust KBART (keepers, eg CLOCKSS/Portico) coverage. This is partially because we don't update containers with KBART year spans very frequently (which is on us), and partially because KBART reports are often a bit out of day (eg, doesn't show coverage for the current year. For that matter, they probably take a few months to update the previous year as well, but that is a larger time span to fudge over. This patch means we will count Portico/LOCKSS/etc coverage for "last year" to count as coverage of publications dated "this year". Note that for this to be effective/correct, it is assumed that we will update containers with coverage year spans at least once a year, and that we will re-index all releases at least once a year.
* datacite: resolve formatting issues in testsMartin Czygan2020-07-1045-150/+54
|\
| * lint (flake8) python test filesBryan Newbold2020-07-0145-168/+71
| |
* | datacite: adjust testsMartin Czygan2020-07-104-10/+6
| |
* | wip: contrib, GH59Martin Czygan2020-07-101-229/+361
| |
* | wip: contrib, GH59Martin Czygan2020-07-105-3/+105
| |
* | datacite: address duplicated contributor issueMartin Czygan2020-07-075-11/+94
|/ | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406
* datacite: improve license mappingMartin Czygan2020-06-301-0/+14
| | | | via "missed potential license", refs #58
* datacite: hard cast possible date value to stringMartin Czygan2020-06-291-0/+1
|
* datacite: fix test docsMartin Czygan2020-05-291-3/+3
|
* Merge branch 'bnewbold-ingest-stage' into 'master'Martin Czygan2020-05-282-7/+41
|\ | | | | | | | | verify release_stage in ingest importer See merge request webgroup/fatcat!52
| * regression test for release_stage mismatch with ingest requestBryan Newbold2020-05-262-7/+41
| |
* | rename HarvestState.next() to HarvestState.next_span()Bryan Newbold2020-05-261-2/+2
|/ | | | | | | | | "span" short for "timespan" to harvest; there may be a better name to use. Motivation for this is to work around a pylint erorr that .next() was not callable. This might be a bug with pylint, but .next() is also a very generic name.
* HACK: try to squelch pylint in CIBryan Newbold2020-05-261-2/+2
| | | | | | | | | | | | | | | | | Gitlab CI is showing lint errors like: =================================== FAILURES =================================== 6316 _______________________ [pylint] tests/harvest_state.py ________________________ 6317 E: 19,11: hs.next is not callable (not-callable) 6318 E: 33,11: hs.next is not callable (not-callable) 6319 E: 19,11: hs.next is not callable (not-callable) [...] this is confusing as we use pipenv with a lock, so I should see the exact same errors locally. This commit is a hack to try and fix this and unbreak builds until we can debug further.
* Merge remote-tracking branch 'github/master'Bryan Newbold2020-05-221-5/+5
|\
| * Indentity is not the same this as equality in PythonChristian Clauss2020-05-141-5/+5
| |
* | datacite: fix type errorMartin Czygan2020-04-223-1/+77
| | | | | | | | | | | | | | Up to now, we expected the description to be a string or list. Add handling for int as well. First appeared: Apr 22 19:58:39.
* | datacite: fix a raw name constraint violationMartin Czygan2020-04-203-1/+78
|/ | | | | | | It was possible that contribs got added which had no raw name. One example would be a name consisting of whitespace only. This fix adds a final check for this case.
* crossref: switch from index-date to update-dateBryan Newbold2020-03-301-1/+1
| | | | | | This goes against what the API docs recommend, but we are currently far behind on updates and need to catch up. Other than what the docs say, this seems to be consistent with the behavior we want.
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-272-3/+3
|\ | | | | Correct spelling mistakes
| * Correct spelling mistakesEdward Betts2020-03-272-3/+3
| |
* | Merge branch 'bnewbold-400-bad-revisions' into 'master'Martin Czygan2020-03-261-0/+2
|\ \ | | | | | | | | | | | | catch ApiValueError in some generic API calls See merge request webgroup/fatcat!35
| * | catch ApiValueError in some generic API callsBryan Newbold2020-03-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation for this change is to handle bogus revision IDs in URLs, which were causing 500 errors not 400 errors. Eg: https://qa.fatcat.wiki/file/rev/5d5d5162-b676-4f0a-968f-e19dadeaf96e%2B2019-11-27%2B13:49:51%2B0%2B6 I have no idea where these URLs are actually coming from, but they should be 4xx not 5xx. Investigating made me realize there is a whole category of ApiValueError exceptions we were not catching and should have been.
* | | improve citeproc/CSL web interfaceBryan Newbold2020-03-252-13/+53
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | This tries to show the citeproc (bibtext, MLA, CSL-JSON) options for more releases, and not show the links when they would break. The primary motivation here is to work around two exceptions being thrown in prod every day (according to sentry): KeyError: 'role' ValueError: CLS requries some surname (family name) I'm guessing these are mostly coming from crawlers following the citeproc links on release landing pages.
* | pubmed: handle multiple ReferenceListBryan Newbold2020-03-202-0/+218
| | | | | | | | | | | | | | This resolves a situation noticed in prod where we were only importing/updating a single reference per article. Includes a regression test.
* | Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-103-0/+80
|\ \ | |/ |/| | | | | pubmed and arxiv harvest preparations See merge request webgroup/fatcat!28
| * pubmed: move mapping generation out of fetch_dateMartin Czygan2020-03-101-0/+2
| | | | | | | | | | * fetch_date will fail on missing mapping * adjust tests (test will require access to pubmed ftp)
| * more pubmed adjustmentsMartin Czygan2020-02-223-0/+78
| | | | | | | | | | * regenerate map in continuous mode * add tests
* | Merge branch 'bnewbold-elastic-v03b'Bryan Newbold2020-02-264-4/+61
|\ \
| * | ES updates: fix tests to accept archive.org in host/domainBryan Newbold2020-02-141-2/+3
| | |
| * | ES releases: host/domain fixesBryan Newbold2020-01-311-0/+3
| | |
| * | implement host+domain parsing for file ES transformBryan Newbold2020-01-301-4/+3
| | |
| * | fix ES file schema plural field namesBryan Newbold2020-01-291-1/+1
| | |
| * | actually implement changelog transformBryan Newbold2020-01-291-1/+23
| | |
| * | fix some transform bugs, add some testsBryan Newbold2020-01-294-5/+16
| | |
| * | first implementation of ES file schemaBryan Newbold2020-01-291-2/+23
| | | | | | | | | | | | | | | Includes a trivial test and transform, but not any workers or doc updates.
* | | shadow import: more filtering of file_meta fieldsBryan Newbold2020-02-132-18/+18
| | |
* | | basic shadow importerBryan Newbold2020-02-132-0/+71
| |/ |/|
* | datacite: add exception for https://www.micropublication.org/Martin Czygan2020-01-311-1/+2
| |
* | datacite: improve date handling and minor tweakMartin Czygan2020-01-303-2/+111
|/ | | | | | | | | | | | | Records from https://www.micropublication.org/ did not have a date in FC, although raw data contained date strings - they were not using the finer-grained "attributes.date" but "attributes.published" and/or "attributes.publicationYear". Support for those fields has been added, including a test case. During this test (#30) a processing gap for names became clear (author may have "given_name" and "surname", but no "name"). This bug has been fixed, too.
* do not normalize "en dash" in DOIMartin Czygan2020-01-171-1/+1
| | | | | | | | | Technically, [...] DOI names may incorporate any printable characters from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1). For mostly QA reasons, we currently treat a DOI with an "en dash" as invalid.
* ingest: improve tests, support old ingest resultsBryan Newbold2020-01-153-1/+18
|
* datacite: add entry to license slug mapMartin Czygan2020-01-091-0/+1
|
* datacite: ignore known unknown values in resourceType*Martin Czygan2020-01-093-1/+95
|
* datacite: abstracts may be strings or list of stringsMartin Czygan2020-01-095-1/+187
|
* datacite: improve license_slug handlingMartin Czygan2020-01-093-2/+33
|