Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | complete FuzzyReleaseMatcher refactoring | Martin Czygan | 2021-12-06 | 1 | -0/+1 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We keep the name, since the api - "matcher.match(release)" - is the same; simplified queries; at most one query is performed against elasticsearch; parallel release retrieval from the API; optional support for release year windows; Test cases are expressed in yaml and will be auto-loaded from the specified directory; test work against the current search endpoint, which means the actual output may change on index updates; for the moment, we think this setup is relatively simple and not too unstable. about: title contrib, partial name input: > { "contribs": [ { "raw_name": "Adams" } ], "title": "digital libraries", "ext_ids": {} } release_year_padding: 1 expected: - 7rmvqtrb2jdyhcxxodihzzcugy - a2u6ougtsjcbvczou6sazsulcm - dy45vilej5diros6zmax46nm4e - exuwhhayird4fdjmmsiqpponlq - gqrj7jikezgcfpjfazhpf4e7c4 - mkmqt3453relbpuyktnmsg6hjq - t2g5sl3dgzchtnq7dynxyzje44 - t4tvenhrvzamraxrvvxivxmvga - wd3oeoi3bffknfbg2ymleqc4ja - y63a6dhrfnb7bltlxfynydbojy | ||||
* | turn "match_release_fuzzy" into a class | Martin Czygan | 2021-11-16 | 1 | -0/+87 |
| | | | | | | | | Goal of this refactoring was to make the matching process a bit more configurable by using a class and a cascade of queries. For a limited test set: `FuzzyReleaseMatcher.match` is works the same as `match_release_fuzzy`. | ||||
* | reorganize notes | Martin Czygan | 2021-09-21 | 6 | -2/+153 |
| | |||||
* | Merge branch 'master' of git.archive.org:webgroup/fuzzycat | Martin Czygan | 2021-07-09 | 1 | -0/+177 |
|\ | | | | | | | | | | | | | | | * 'master' of git.archive.org:webgroup/fuzzycat: simplify README for general audience; move some content to notes sandcrawler slugify: lower-case greek ambiguity (OCR) DOI clean/normalize helper; and use in verification etc verify: page count parsing and comparison improvements | ||||
| * | simplify README for general audience; move some content to notes | Bryan Newbold | 2021-07-01 | 1 | -0/+177 |
| | | |||||
* | | notes on matching metrics | Martin Czygan | 2021-07-08 | 1 | -0/+16 |
| | | |||||
* | | cleanup notes | Martin Czygan | 2021-07-08 | 2 | -13/+0 |
|/ | |||||
* | update diagram | Martin Czygan | 2020-12-24 | 3 | -5/+7 |
| | |||||
* | add case: article, erratum | Martin Czygan | 2020-12-23 | 1 | -0/+8 |
| | |||||
* | fix color | Martin Czygan | 2020-12-18 | 1 | -0/+0 |
| | |||||
* | update notes | Martin Czygan | 2020-12-18 | 2 | -1/+1 |
| | |||||
* | update README | Martin Czygan | 2020-12-18 | 2 | -0/+14 |
| | |||||
* | update notes | Martin Czygan | 2020-12-17 | 1 | -0/+14 |
| | |||||
* | wip: notes | Martin Czygan | 2020-12-17 | 2 | -4/+62 |
| | |||||
* | focus on notes, first | Martin Czygan | 2020-12-16 | 3 | -0/+19 |
| | |||||
* | add issue | Martin Czygan | 2020-12-16 | 1 | -0/+17 |
| | |||||
* | cleanup | Martin Czygan | 2020-12-15 | 1 | -0/+33 |
| | |||||
* | update stats | Martin Czygan | 2020-12-11 | 1 | -17/+22 |
| | |||||
* | update notes | Martin Czygan | 2020-12-10 | 1 | -0/+2 |
| | |||||
* | add item to blacklist | Martin Czygan | 2020-12-10 | 1 | -7/+9 |
| | |||||
* | complete list | Martin Czygan | 2020-12-10 | 1 | -1/+6 |
| | |||||
* | a case of different reviews, but quite ambiguous | Martin Czygan | 2020-12-10 | 1 | -0/+1 |
| | |||||
* | add ambiguous case | Martin Czygan | 2020-12-10 | 1 | -0/+10 |
| | |||||
* | add cases | Martin Czygan | 2020-12-10 | 1 | -0/+11 |
| | |||||
* | update cases | Martin Czygan | 2020-12-10 | 1 | -0/+23 |
| | |||||
* | update notes | Martin Czygan | 2020-12-10 | 1 | -25/+0 |
| | |||||
* | add cases | Martin Czygan | 2020-12-10 | 1 | -2/+0 |
| | |||||
* | add versioned doi pattern | Martin Czygan | 2020-12-10 | 1 | -0/+9 |
| | |||||
* | separate static data | Martin Czygan | 2020-12-10 | 1 | -0/+5 |
| | |||||
* | add a few more dummy cases | Martin Czygan | 2020-12-09 | 1 | -0/+13 |
| | |||||
* | update stats | Martin Czygan | 2020-12-09 | 1 | -12/+17 |
| | |||||
* | add subdoc case | Martin Czygan | 2020-12-09 | 1 | -0/+5 |
| | |||||
* | 138 cases | Martin Czygan | 2020-12-09 | 1 | -0/+5 |
| | |||||
* | add case | Martin Czygan | 2020-12-09 | 1 | -0/+12 |
| | |||||
* | update notes | Martin Czygan | 2020-12-09 | 1 | -0/+2 |
| | |||||
* | add another case from sample | Martin Czygan | 2020-12-09 | 1 | -0/+2 |
| | |||||
* | add two more cases | Martin Czygan | 2020-12-09 | 1 | -0/+3 |
| | |||||
* | update notes | Martin Czygan | 2020-12-09 | 1 | -1/+4 |
| | |||||
* | add example | Martin Czygan | 2020-12-09 | 1 | -1/+4 |
| | |||||
* | add another case | Martin Czygan | 2020-12-09 | 1 | -2/+4 |
| | |||||
* | another case | Martin Czygan | 2020-12-09 | 1 | -0/+7 |
| | |||||
* | add two cases | Martin Czygan | 2020-12-08 | 1 | -1/+7 |
| | |||||
* | add case | Martin Czygan | 2020-12-08 | 1 | -1/+10 |
| | |||||
* | add case | Martin Czygan | 2020-12-05 | 1 | -1/+9 |
| | |||||
* | update cases; ok.work_id | Martin Czygan | 2020-12-04 | 1 | -8/+5 |
| | |||||
* | case: ignore choice review | Martin Czygan | 2020-12-04 | 1 | -0/+50 |
| | |||||
* | add case | Martin Czygan | 2020-12-03 | 1 | -0/+1 |
| | |||||
* | update stats | Martin Czygan | 2020-12-03 | 1 | -16/+22 |
| | |||||
* | add cases | Martin Czygan | 2020-12-03 | 1 | -0/+6 |
| | |||||
* | add iop case | Martin Czygan | 2020-12-02 | 1 | -1/+10 |
| |