summaryrefslogtreecommitdiffstats
path: root/python/tests
Commit message (Collapse)AuthorAgeFilesLines
* improve dblp release importBryan Newbold2020-12-171-3/+4
|
* very simple dblp container importerBryan Newbold2020-12-174-5/+77
|
* basic test coverage of dblp release importerBryan Newbold2020-12-174-0/+503
|
* add 'lxml' mode for large XML file import, and multi-tagsBryan Newbold2020-12-171-2/+2
|
* fix sloppy is_preserved ES transfom test failureBryan Newbold2020-12-171-1/+1
|
* Merge branch 'bnewbold-doaj-fuzzy' into 'master'bnewbold2020-12-183-2/+99
|\ | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92
| * update fuzzy helper to pass 'reason' through to import codeBryan Newbold2020-12-171-2/+2
| | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases.
| * add fuzzy match filtering to DOAJ importerBryan Newbold2020-12-161-2/+14
| | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity.
| * add fuzzy matching helper to importer base classBryan Newbold2020-12-162-0/+85
| | | | | | | | Using fuzzycat. Add basic test coverage.
* | improve release elasticsearch transform test coverageBryan Newbold2020-12-163-11/+86
|/
* DOAJ: remove accidentally commited 'skip' of a testBryan Newbold2020-11-201-1/+0
|
* doaj: fix update code path (getattr not __dict__)Bryan Newbold2020-11-202-11/+67
| | | | Also add missing code coverage for update path (disabled by default).
* implement remainder of DOAJ article importerBryan Newbold2020-11-191-11/+6
|
* initial implementation of DOAJ importerBryan Newbold2020-11-192-0/+97
| | | | Several things to finish implementing and polish.
* ingest: fix XML ingest test fileBryan Newbold2020-11-051-1/+1
|
* ingest: progress on HTML ingestBryan Newbold2020-11-052-2/+44
|
* ingest: tests for basic XML ingestBryan Newbold2020-11-052-0/+18
|
* ingest: basic checks for ingest_typeBryan Newbold2020-11-052-1/+7
|
* Merge branch 'bnewbold-202009-polish' into 'master'Martin Czygan2020-09-292-6/+6
|\ | | | | | | | | fatcat.wiki 2020-09 polish See merge request webgroup/fatcat!84
| * lint cleanupsBryan Newbold2020-09-171-2/+0
| |
| * web: route constraints on fcids and UUIDsBryan Newbold2020-09-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | Instead of accepting any string for these parameters and throwing a 400 error if not the correct type, implement better route matching at the framework level and return more 404s. This resolves several outstanding sentry exceptions. The "flask-uuid" was imported and seems to have been configured for this purpose previously, but I guess I never finished configuring it.
* | address spammy datacite titlesMartin Czygan2020-09-231-0/+6
|/ | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently.
* datacite: handle case of empty-string versionBryan Newbold2020-09-102-1/+2
| | | | | Includes a tiny tweak to the datacite import sample file to test this code path.
* generic file entity clean-ups as part of file_meta importerBryan Newbold2020-09-021-0/+99
|
* fixes and test coverage for file_meta importerBryan Newbold2020-08-212-0/+68
|
* datacite importer: update test cases for 'Additional file' as component, not ↵Bryan Newbold2020-08-115-5/+5
| | | | stub
* datacite import: figshare-specific hacksBryan Newbold2020-08-111-0/+1
|
* fix typo bug resulting in lost/bad ext_id web editsBryan Newbold2020-07-311-0/+14
|
* implement webface entity deletionBryan Newbold2020-07-311-0/+57
|
* fix search redirect codes in new testsBryan Newbold2020-07-301-4/+4
|
* wire up new TOML viewsBryan Newbold2020-07-302-20/+62
|
* basic toml transform helperBryan Newbold2020-07-301-0/+22
|
* simple search route increased coverageBryan Newbold2020-07-301-0/+27
|
* minor lint fixesBryan Newbold2020-07-301-1/+0
|
* coverage search: 'recent' endpoint test (minimal)Bryan Newbold2020-07-301-1/+32
|
* expand test coverage of new preservation viewsBryan Newbold2020-07-301-15/+122
|
* refactor coverage tests/mocksBryan Newbold2020-07-305-39/+80
|
* coverage test mock fixesBryan Newbold2020-07-301-14/+51
|
* lint coverage changes (so far)Bryan Newbold2020-07-302-15/+3
|
* include new-style preservation+release_type aggs in container statsBryan Newbold2020-07-301-1/+12
|
* add regression test for broken container coverageBryan Newbold2020-07-302-57/+98
| | | | also shuffle around search/coverage test files
* small lint fixesBryan Newbold2020-07-241-1/+0
|
* finish backend refactoring of search codeBryan Newbold2020-07-241-2/+77
|
* update web_search tests to mock ES clientBryan Newbold2020-07-242-45/+47
| | | | | | Instead of using 'responses' mock of 'requests' library. Tried using 'elasticmock' helper but it didn't work.
* refactor release and container searchBryan Newbold2020-07-241-5/+2
| | | | | | | | | | Based on fatcat-scholar refactoring. This doesn't include refactoring of stats, aggregates, or histograms yet, just the direct queries. Don't have any test coverage yet; intend to try elasticmock or figuring out how to ingest mock JSON results directly.
* Merge branch 'bnewbold-more-lint-fixes' into 'master'Martin Czygan2020-07-245-6/+5
|\ | | | | | | | | more lint fixes See merge request webgroup/fatcat!69
| * fix actual typo in tests (caught by lint)Bryan Newbold2020-07-231-2/+2
| |
| * simple lint (flake8) fixes in testsBryan Newbold2020-07-234-4/+3
| | | | | | | | | | | | The pytest fixture syntax interacts weirdly with flake8 tests, so ignore the "redefinition" and "unused variable" errors more carefully for .py files under ./tests/
* | make in_kbart transform inclusive of last yearBryan Newbold2020-07-231-0/+46
|/ | | | | | | | | | | | | | | | | Frequently when looking at preservation coverage of journals, the current year shows as "un-preserved" when in fact there is robust KBART (keepers, eg CLOCKSS/Portico) coverage. This is partially because we don't update containers with KBART year spans very frequently (which is on us), and partially because KBART reports are often a bit out of day (eg, doesn't show coverage for the current year. For that matter, they probably take a few months to update the previous year as well, but that is a larger time span to fudge over. This patch means we will count Portico/LOCKSS/etc coverage for "last year" to count as coverage of publications dated "this year". Note that for this to be effective/correct, it is assumed that we will update containers with coverage year spans at least once a year, and that we will re-index all releases at least once a year.
* datacite: resolve formatting issues in testsMartin Czygan2020-07-1045-150/+54
|\