aboutsummaryrefslogtreecommitdiffstats
path: root/tests/test_matching.py
Commit message (Collapse)AuthorAgeFilesLines
* apply first round of feedback on matchingHEADmasterMartin Czygan2021-12-211-3/+4
|
* matching: cleanup test filesMartin Czygan2021-12-061-1/+1
|
* complete FuzzyReleaseMatcher refactoringMartin Czygan2021-12-061-84/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We keep the name, since the api - "matcher.match(release)" - is the same; simplified queries; at most one query is performed against elasticsearch; parallel release retrieval from the API; optional support for release year windows; Test cases are expressed in yaml and will be auto-loaded from the specified directory; test work against the current search endpoint, which means the actual output may change on index updates; for the moment, we think this setup is relatively simple and not too unstable. about: title contrib, partial name input: > { "contribs": [ { "raw_name": "Adams" } ], "title": "digital libraries", "ext_ids": {} } release_year_padding: 1 expected: - 7rmvqtrb2jdyhcxxodihzzcugy - a2u6ougtsjcbvczou6sazsulcm - dy45vilej5diros6zmax46nm4e - exuwhhayird4fdjmmsiqpponlq - gqrj7jikezgcfpjfazhpf4e7c4 - mkmqt3453relbpuyktnmsg6hjq - t2g5sl3dgzchtnq7dynxyzje44 - t4tvenhrvzamraxrvvxivxmvga - wd3oeoi3bffknfbg2ymleqc4ja - y63a6dhrfnb7bltlxfynydbojy
* complete migration from away from match_release_fuzzyMartin Czygan2021-11-161-81/+1
| | | | | Instead, use `FuzzyReleaseMatcher.match`, which has approximately the same behavior.
* turn "match_release_fuzzy" into a classMartin Czygan2021-11-161-7/+116
| | | | | | | | Goal of this refactoring was to make the matching process a bit more configurable by using a class and a cascade of queries. For a limited test set: `FuzzyReleaseMatcher.match` is works the same as `match_release_fuzzy`.
* start larger refactoring: remove clusterMartin Czygan2021-09-241-2/+4
| | | | | | | | | | | | | | | | | | background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust.
* tests: temporarily disable testsMartin Czygan2021-09-211-12/+12
| | | | | We want to first move to elasticsearch dsl and will reactivate and extends after refactoring.
* matching: run an additional es query for fuzzy matchingMartin Czygan2021-09-211-2/+20
|
* style: apply formattingMartin Czygan2021-09-211-3/+12
|
* lint: remove unused importsBryan Newbold2021-05-311-1/+0
|
* cleanup merge artifactMartin Czygan2021-04-151-1/+0
|
* Merge branch 'bnewbold-dev-setup'Martin Czygan2021-04-151-1/+8
|\ | | | | | | | | | | | | | | | | | | * bnewbold-dev-setup: dynaconf: switch to fuzzycat.config import across project upgrade to python3.8 gitlab CI: try 'make deps' and 'make test' makefile: run common commands inside pipenv makefile: change 'deps' to be simple --dev --deploy make fmt
| * dynaconf: switch to fuzzycat.config import across projectBryan Newbold2021-04-131-2/+1
| | | | | | | | This is the recommended way to use dynaconf.
* | fix imports and formattingMartin Czygan2021-04-141-5/+12
| |
* | test: skip if configured search server is not reachableMartin Czygan2021-04-141-0/+14
| |
* | tests: run es tests against public search endpointMartin Czygan2021-04-141-8/+31
|/
* inject configurationMartin Czygan2020-12-231-1/+5
|
* update referenceMartin Czygan2020-12-161-1/+1
|
* add skip reasonMartin Czygan2020-12-161-1/+1
|
* docs and release match commandMartin Czygan2020-12-161-4/+11
|
* matching stubMartin Czygan2020-12-151-0/+19
|
* cleanupMartin Czygan2020-10-211-4/+0
|
* stub: command lineMartin Czygan2020-08-181-2/+1
|
* tests: add stubMartin Czygan2020-08-171-0/+5