start larger refactoring: remove cluster

background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust.
author: Martin Czygan <martin.czygan@gmail.com> 2021-09-24 13:58:51 +0200
committer: Martin Czygan <martin.czygan@gmail.com> 2021-09-24 13:58:51 +0200
commit: 478d7d06ad9e56145cb94f3461c355b1ba9eb491 (patch)
tree: fa467290e8c8df41a1e97a6de751d0f7e790c9de /fuzzycat/matching.py
parent: 86cc3191ce03042ef4a0c6c8a44f4094a140b802 (diff)
download: fuzzycat-478d7d06ad9e56145cb94f3461c355b1ba9eb491.tar.gz
fuzzycat-478d7d06ad9e56145cb94f3461c355b1ba9eb491.zip
1 files changed, 0 insertions, 2 deletions
diff --git a/fuzzycat/matching.py b/fuzzycat/matching.py
index 310dfc2..bcda46d 100644
--- a/fuzzycat/matching.py
+++ b/fuzzycat/matching.py
@@ -73,7 +73,6 @@ def match_release_fuzzy(
         if r:
             return [r]
 
-
     if release.title is not None and release.contribs is not None:
         names = " ".join([c.raw_name for c in release.contribs])
         body = {
@@ -178,7 +177,6 @@ def match_release_fuzzy(
     if es_compat_hits_total(resp) > 0:
         return response_to_entity_list(resp, entity_type=ReleaseEntity, size=size, api=api)
 
-
     # TODO: perform more queries on other fields.
     return []
author	Martin Czygan <martin.czygan@gmail.com>	2021-09-24 13:58:51 +0200
committer	Martin Czygan <martin.czygan@gmail.com>	2021-09-24 13:58:51 +0200
commit	478d7d06ad9e56145cb94f3461c355b1ba9eb491 (patch)
tree	fa467290e8c8df41a1e97a6de751d0f7e790c9de /fuzzycat/matching.py
parent	86cc3191ce03042ef4a0c6c8a44f4094a140b802 (diff)
download	fuzzycat-478d7d06ad9e56145cb94f3461c355b1ba9eb491.tar.gz fuzzycat-478d7d06ad9e56145cb94f3461c355b1ba9eb491.zip