aboutsummaryrefslogtreecommitdiffstats
path: root/tests/test_utils.py
Commit message (Collapse)AuthorAgeFilesLines
* start larger refactoring: remove clusterMartin Czygan2021-09-241-8/+8
| | | | | | | | | | | | | | | | | | background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust.
* style: apply formattingMartin Czygan2021-09-211-0/+1
|
* DOI clean/normalize helper; and use in verification etcBryan Newbold2021-07-011-1/+14
|
* verify: page count parsing and comparison improvementsBryan Newbold2021-07-011-2/+7
|
* fix imports and formattingMartin Czygan2021-04-141-3/+14
|
* address es hits.total change in ES7Martin Czygan2021-04-121-1/+10
| | | | * https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html
* add compress kwarg to clusterMartin Czygan2021-02-021-1/+15
| | | | Will compress intermediate results with zstd (https://git.io/Jt00y9).
* single item verificationMartin Czygan2020-12-151-0/+2
|
* verify: move out some code to utilsMartin Czygan2020-12-141-1/+20
|
* add another caseMartin Czygan2020-12-011-3/+3
|
* move helpers to utilsMartin Czygan2020-11-251-2/+2
|
* move enums into commonMartin Czygan2020-11-251-1/+36
|
* add more test casesMartin Czygan2020-11-251-1/+2
|
* apply formattingMartin Czygan2020-11-251-0/+1
|
* extend testsMartin Czygan2020-11-251-1/+6
|
* extend test coverageMartin Czygan2020-11-251-0/+23
|
* cleanupMartin Czygan2020-10-211-128/+0
|
* switch to yapfMartin Czygan2020-08-121-4/+17
|
* add testsMartin Czygan2020-08-121-0/+115