diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-11-17 14:51:50 +0100 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-12-06 19:53:30 +0100 |
commit | dd6149140542585f2b0bfc3b334ec2b0a88b790e (patch) | |
tree | 6a11c228558cfbf73932bc828cda9be3735cfd78 /TODO.md | |
parent | d104f8d0ba8eef5563555de82be66bbf17f961db (diff) | |
download | fuzzycat-dd6149140542585f2b0bfc3b334ec2b0a88b790e.tar.gz fuzzycat-dd6149140542585f2b0bfc3b334ec2b0a88b790e.zip |
complete FuzzyReleaseMatcher refactoring
We keep the name, since the api - "matcher.match(release)" - is the
same; simplified queries; at most one query is performed against
elasticsearch; parallel release retrieval from the API; optional support
for release year windows;
Test cases are expressed in yaml and will be auto-loaded from the
specified directory; test work against the current search endpoint,
which means the actual output may change on index updates; for the
moment, we think this setup is relatively simple and not too unstable.
about: title contrib, partial name
input: >
{
"contribs": [
{
"raw_name": "Adams"
}
],
"title": "digital libraries",
"ext_ids": {}
}
release_year_padding: 1
expected:
- 7rmvqtrb2jdyhcxxodihzzcugy
- a2u6ougtsjcbvczou6sazsulcm
- dy45vilej5diros6zmax46nm4e
- exuwhhayird4fdjmmsiqpponlq
- gqrj7jikezgcfpjfazhpf4e7c4
- mkmqt3453relbpuyktnmsg6hjq
- t2g5sl3dgzchtnq7dynxyzje44
- t4tvenhrvzamraxrvvxivxmvga
- wd3oeoi3bffknfbg2ymleqc4ja
- y63a6dhrfnb7bltlxfynydbojy
Diffstat (limited to 'TODO.md')
-rw-r--r-- | TODO.md | 5 |
1 files changed, 5 insertions, 0 deletions
@@ -1,5 +1,10 @@ # TODO +* [ ] match release with fewer requests (or do them in parallel) +* [ ] de-clobber verify + +---- + * [ ] clustering should be broken up, e.g. into "map" and "sort" * [x] match release should be a class * [x] match release fuzzy should work not just with title |