summaryrefslogtreecommitdiffstats
path: root/python/tests
Commit message (Collapse)AuthorAgeFilesLines
* remove deprecated extid sqlite3 lookup table feature from importersBryan Newbold2021-11-095-22/+6
| | | | | | | | This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall.
* python tests: verify array sort orderBryan Newbold2021-11-054-20/+18
| | | | | | | In a couple cases (eg, filesets), had made tests agnostic to sort order, because the sort order was not stable. In other cases, simply small cleanups and comment improvements.
* typing: first batch of python bulk type annotationsBryan Newbold2021-11-032-4/+5
| | | | | | While these changes are more delicate than simple lint changes, this specific batch of edits and annotations was *relatively* simple, and resulted in few code changes other than function signature additions.
* fmt (black): tests/Bryan Newbold2021-11-0255-1430/+1852
|
* python: isort everythingBryan Newbold2021-11-0239-73/+98
|
* lint: simple, safe inline lint fixesBryan Newbold2021-11-0217-125/+125
| | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc
* lint/fmt: remove all 'import *'Bryan Newbold2021-11-021-2/+2
|
* hacks to work around new pylint false positivesBryan Newbold2021-11-021-9/+15
|
* cleanup imports after fatcat_tools.transforms changeBryan Newbold2021-11-024-16/+33
|
* temporary hack around filesets.manifest order instabilityBryan Newbold2021-11-021-3/+4
| | | | | | May need some change in fatcatd or schema? This isn't a new issue, that part of schema has been around for a long time, just getting detected now with these tests.
* generic fileset importer class, with test coverageBryan Newbold2021-10-142-0/+60
|
* web: editor username /u/<username> helperBryan Newbold2021-10-131-0/+8
|
* python: additional test coverage for v0.4 changesBryan Newbold2021-10-132-2/+19
|
* python: test coverage of rust schema changesBryan Newbold2021-10-134-2/+59
|
* datacite: skip empty abstractsMartin Czygan2021-10-013-1/+91
| | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required`
* trivial blank line lintBryan Newbold2021-09-081-1/+0
|
* refs: web UI tweaks for iterated CSL schemaBryan Newbold2021-08-031-3/+7
|
* refs: start the most basic/minimal web refs test coverage ('integration' level)Bryan Newbold2021-07-274-0/+1094
|
* tests: small citeproc style changes (to match Pipfile.lock update)Bryan Newbold2021-06-232-3/+4
|
* datacite: more careful title string access; fixes sentry #88350Martin Czygan2021-06-113-1/+96
| | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'}
* dblp tests: skip redundant seek(0)Bryan Newbold2021-06-031-6/+1
|
* ingest: add per-container ingest type overridesBryan Newbold2021-05-211-0/+6
|
* fix arabesque sqlite3 examples to have 14-digit timestampsBryan Newbold2021-05-211-0/+0
|
* make dblp tests more robustBryan Newbold2021-04-121-2/+11
| | | | | | These were causing a lot of spurious errors in local development. Not sure these tweaks will entirely fix the problem.
* transform tool: container transform stats lookup supportBryan Newbold2021-04-061-0/+1
|
* search container stats: changes to be called from index code pathBryan Newbold2021-04-061-0/+10
| | | | Eg, allowing injection of more config values
* container search schema: preservation stats, new fieldsBryan Newbold2021-04-061-5/+42
| | | | Includes transform code updates and partial test coverage.
* datacite: a missing surname should be None, not the empty stringMartin Czygan2021-04-022-2/+0
| | | | refs sentry #77700
* improve dblp release importBryan Newbold2020-12-171-3/+4
|
* very simple dblp container importerBryan Newbold2020-12-174-5/+77
|
* basic test coverage of dblp release importerBryan Newbold2020-12-174-0/+503
|
* add 'lxml' mode for large XML file import, and multi-tagsBryan Newbold2020-12-171-2/+2
|
* fix sloppy is_preserved ES transfom test failureBryan Newbold2020-12-171-1/+1
|
* Merge branch 'bnewbold-doaj-fuzzy' into 'master'bnewbold2020-12-183-2/+99
|\ | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92
| * update fuzzy helper to pass 'reason' through to import codeBryan Newbold2020-12-171-2/+2
| | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases.
| * add fuzzy match filtering to DOAJ importerBryan Newbold2020-12-161-2/+14
| | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity.
| * add fuzzy matching helper to importer base classBryan Newbold2020-12-162-0/+85
| | | | | | | | Using fuzzycat. Add basic test coverage.
* | improve release elasticsearch transform test coverageBryan Newbold2020-12-163-11/+86
|/
* DOAJ: remove accidentally commited 'skip' of a testBryan Newbold2020-11-201-1/+0
|
* doaj: fix update code path (getattr not __dict__)Bryan Newbold2020-11-202-11/+67
| | | | Also add missing code coverage for update path (disabled by default).
* implement remainder of DOAJ article importerBryan Newbold2020-11-191-11/+6
|
* initial implementation of DOAJ importerBryan Newbold2020-11-192-0/+97
| | | | Several things to finish implementing and polish.
* ingest: fix XML ingest test fileBryan Newbold2020-11-051-1/+1
|
* ingest: progress on HTML ingestBryan Newbold2020-11-052-2/+44
|
* ingest: tests for basic XML ingestBryan Newbold2020-11-052-0/+18
|
* ingest: basic checks for ingest_typeBryan Newbold2020-11-052-1/+7
|
* Merge branch 'bnewbold-202009-polish' into 'master'Martin Czygan2020-09-292-6/+6
|\ | | | | | | | | fatcat.wiki 2020-09 polish See merge request webgroup/fatcat!84
| * lint cleanupsBryan Newbold2020-09-171-2/+0
| |
| * web: route constraints on fcids and UUIDsBryan Newbold2020-09-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | Instead of accepting any string for these parameters and throwing a 400 error if not the correct type, implement better route matching at the framework level and return more 404s. This resolves several outstanding sentry exceptions. The "flask-uuid" was imported and seems to have been configured for this purpose previously, but I guess I never finished configuring it.
* | address spammy datacite titlesMartin Czygan2020-09-231-0/+6
|/ | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently.