Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | start larger refactoring: remove cluster | Martin Czygan | 2021-09-24 | 1 | -8/+8 |
| | | | | | | | | | | | | | | | | | | background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust. | ||||
* | style: apply formatting | Martin Czygan | 2021-09-21 | 1 | -0/+1 |
| | |||||
* | DOI clean/normalize helper; and use in verification etc | Bryan Newbold | 2021-07-01 | 1 | -1/+14 |
| | |||||
* | verify: page count parsing and comparison improvements | Bryan Newbold | 2021-07-01 | 1 | -2/+7 |
| | |||||
* | fix imports and formatting | Martin Czygan | 2021-04-14 | 1 | -3/+14 |
| | |||||
* | address es hits.total change in ES7 | Martin Czygan | 2021-04-12 | 1 | -1/+10 |
| | | | | * https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html | ||||
* | add compress kwarg to cluster | Martin Czygan | 2021-02-02 | 1 | -1/+15 |
| | | | | Will compress intermediate results with zstd (https://git.io/Jt00y9). | ||||
* | single item verification | Martin Czygan | 2020-12-15 | 1 | -0/+2 |
| | |||||
* | verify: move out some code to utils | Martin Czygan | 2020-12-14 | 1 | -1/+20 |
| | |||||
* | add another case | Martin Czygan | 2020-12-01 | 1 | -3/+3 |
| | |||||
* | move helpers to utils | Martin Czygan | 2020-11-25 | 1 | -2/+2 |
| | |||||
* | move enums into common | Martin Czygan | 2020-11-25 | 1 | -1/+36 |
| | |||||
* | add more test cases | Martin Czygan | 2020-11-25 | 1 | -1/+2 |
| | |||||
* | apply formatting | Martin Czygan | 2020-11-25 | 1 | -0/+1 |
| | |||||
* | extend tests | Martin Czygan | 2020-11-25 | 1 | -1/+6 |
| | |||||
* | extend test coverage | Martin Czygan | 2020-11-25 | 1 | -0/+23 |
| | |||||
* | cleanup | Martin Czygan | 2020-10-21 | 1 | -128/+0 |
| | |||||
* | switch to yapf | Martin Czygan | 2020-08-12 | 1 | -4/+17 |
| | |||||
* | add tests | Martin Czygan | 2020-08-12 | 1 | -0/+115 |