# Proposal: Fuzzycat Refactoring * Goal: Refactor fuzzycat to make matching and verification more composable, configurable and testable. * Status: wip A better design. * it has a correct scope (e.g. match X; very Y) * it has good defaults, but allows configuration * it is clear how and where to extend functionality * it is easy to add one new test for a case ## Matching * fuzzy matching will be a cascade of queries, until a result is returned * there is an order of queries from exact to very fuzzy * alternatively, we could use "ensemble matching", that takes the intersection of a couple of queries * ES queries cannot cover all cases, we need to add additional checks; e.g. author list comparison Example FuzzyReleaseMatcher match_release_id match_release_exact_title_exact_contrib match_release_... match_release_fuzzy (runs a cascade of queries) Each function is testable on its own. The class keeps the es client and other global config around. It's scope is clear: given a "release" (or maybe just a title string), generate a list of potentially related releases. Other entities follow the same pattern. FuzzyContainerMatcher match_container_id match_container_issn match_container_abbreviation match_container_... match_container_fuzzy (runs a cascade of queries) A helper object (not exactly the entity) for matching list of authors. Allows to match by various means, e.g. exact, short names, partial lists, etc. Should account for case, order, etc. FuzzyContribsMatcher match_exact match_short_names match_partial_list match_fuzzy For each method in each matcher class, we can construct a test case only for one particular method. A new method can be added with easy and tested separately. Don't know how yet, but we can create some "profiles" that allow for a matching by a set of methods. Or use good defaults on the higher level `_fuzzy(...)` method. NOTE: the matcher classes could use the verification code internally; generate a list of matches with an es query, then use a configured verifier to generate verified matches; only put comparison code into verification module. ## Verification (comparison) Verification works similarly. For each entity we define a set of methods, verifying a specific aspect. FuzzyReleaseVerifier verify_release_id verify_release_ext_id verify_release_title_exact_match verify_release_title_contrib_exact_match verify_release_... verify(a, b) -> (Status, Reason) A large number of test cases are already there, may need a bit better structure to relate cases to methods. The class can hold global configuration, maybe some cached computed properties, if that helps. FuzzyContainerVerifier verify_container_id ...