# Matching Metrics ## Precision/Recall For fuzzy matching we want to understand precision and recall. Options for test datasets: * manually curated (100s of examples); could determine * autogenerate slightly different set of real-world metadata (e.g. crossref vs. doaj) converted to releases * automatically distorted set of records; 1 original, plus N distorted (synthetic) ## Overall numbers * number of clusters per clustering method: "title", "lowercase", "nysiis", "sandcrawler", a few more - contrastive comparison of these cluster, e.g. how many more matches/non-matches we get for the various methods * take N docs from non-clusters and run verify; we would want 100% different/ambiguous results