aboutsummaryrefslogtreecommitdiffstats
path: root/notes
Commit message (Collapse)AuthorAgeFilesLines
* complete FuzzyReleaseMatcher refactoringMartin Czygan2021-12-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We keep the name, since the api - "matcher.match(release)" - is the same; simplified queries; at most one query is performed against elasticsearch; parallel release retrieval from the API; optional support for release year windows; Test cases are expressed in yaml and will be auto-loaded from the specified directory; test work against the current search endpoint, which means the actual output may change on index updates; for the moment, we think this setup is relatively simple and not too unstable. about: title contrib, partial name input: > { "contribs": [ { "raw_name": "Adams" } ], "title": "digital libraries", "ext_ids": {} } release_year_padding: 1 expected: - 7rmvqtrb2jdyhcxxodihzzcugy - a2u6ougtsjcbvczou6sazsulcm - dy45vilej5diros6zmax46nm4e - exuwhhayird4fdjmmsiqpponlq - gqrj7jikezgcfpjfazhpf4e7c4 - mkmqt3453relbpuyktnmsg6hjq - t2g5sl3dgzchtnq7dynxyzje44 - t4tvenhrvzamraxrvvxivxmvga - wd3oeoi3bffknfbg2ymleqc4ja - y63a6dhrfnb7bltlxfynydbojy
* turn "match_release_fuzzy" into a classMartin Czygan2021-11-161-0/+87
| | | | | | | | Goal of this refactoring was to make the matching process a bit more configurable by using a class and a cascade of queries. For a limited test set: `FuzzyReleaseMatcher.match` is works the same as `match_release_fuzzy`.
* reorganize notesMartin Czygan2021-09-216-2/+153
|
* Merge branch 'master' of git.archive.org:webgroup/fuzzycatMartin Czygan2021-07-091-0/+177
|\ | | | | | | | | | | | | | | * 'master' of git.archive.org:webgroup/fuzzycat: simplify README for general audience; move some content to notes sandcrawler slugify: lower-case greek ambiguity (OCR) DOI clean/normalize helper; and use in verification etc verify: page count parsing and comparison improvements
| * simplify README for general audience; move some content to notesBryan Newbold2021-07-011-0/+177
| |
* | notes on matching metricsMartin Czygan2021-07-081-0/+16
| |
* | cleanup notesMartin Czygan2021-07-082-13/+0
|/
* update diagramMartin Czygan2020-12-243-5/+7
|
* add case: article, erratumMartin Czygan2020-12-231-0/+8
|
* fix colorMartin Czygan2020-12-181-0/+0
|
* update notesMartin Czygan2020-12-182-1/+1
|
* update READMEMartin Czygan2020-12-182-0/+14
|
* update notesMartin Czygan2020-12-171-0/+14
|
* wip: notesMartin Czygan2020-12-172-4/+62
|
* focus on notes, firstMartin Czygan2020-12-163-0/+19
|
* add issueMartin Czygan2020-12-161-0/+17
|
* cleanupMartin Czygan2020-12-151-0/+33
|
* update statsMartin Czygan2020-12-111-17/+22
|
* update notesMartin Czygan2020-12-101-0/+2
|
* add item to blacklistMartin Czygan2020-12-101-7/+9
|
* complete listMartin Czygan2020-12-101-1/+6
|
* a case of different reviews, but quite ambiguousMartin Czygan2020-12-101-0/+1
|
* add ambiguous caseMartin Czygan2020-12-101-0/+10
|
* add casesMartin Czygan2020-12-101-0/+11
|
* update casesMartin Czygan2020-12-101-0/+23
|
* update notesMartin Czygan2020-12-101-25/+0
|
* add casesMartin Czygan2020-12-101-2/+0
|
* add versioned doi patternMartin Czygan2020-12-101-0/+9
|
* separate static dataMartin Czygan2020-12-101-0/+5
|
* add a few more dummy casesMartin Czygan2020-12-091-0/+13
|
* update statsMartin Czygan2020-12-091-12/+17
|
* add subdoc caseMartin Czygan2020-12-091-0/+5
|
* 138 casesMartin Czygan2020-12-091-0/+5
|
* add caseMartin Czygan2020-12-091-0/+12
|
* update notesMartin Czygan2020-12-091-0/+2
|
* add another case from sampleMartin Czygan2020-12-091-0/+2
|
* add two more casesMartin Czygan2020-12-091-0/+3
|
* update notesMartin Czygan2020-12-091-1/+4
|
* add exampleMartin Czygan2020-12-091-1/+4
|
* add another caseMartin Czygan2020-12-091-2/+4
|
* another caseMartin Czygan2020-12-091-0/+7
|
* add two casesMartin Czygan2020-12-081-1/+7
|
* add caseMartin Czygan2020-12-081-1/+10
|
* add caseMartin Czygan2020-12-051-1/+9
|
* update cases; ok.work_idMartin Czygan2020-12-041-8/+5
|
* case: ignore choice reviewMartin Czygan2020-12-041-0/+50
|
* add caseMartin Czygan2020-12-031-0/+1
|
* update statsMartin Czygan2020-12-031-16/+22
|
* add casesMartin Czygan2020-12-031-0/+6
|
* add iop caseMartin Czygan2020-12-021-1/+10
|