summaryrefslogtreecommitdiffstats
path: root/python/tests/files/journal_metadata.sample.json
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2020-12-16 19:56:01 -0800
committerBryan Newbold <bnewbold@robocracy.org>2020-12-16 20:16:09 -0800
commit6d5811693c36b9e73dedf0205c40f2aed63e2870 (patch)
tree717de06d66ac009205a91cdeb511d113d61eac85 /python/tests/files/journal_metadata.sample.json
parent38328c25674fee7781a8d8601e1d47de04186f19 (diff)
downloadfatcat-6d5811693c36b9e73dedf0205c40f2aed63e2870.tar.gz
fatcat-6d5811693c36b9e73dedf0205c40f2aed63e2870.zip
add fuzzy match filtering to DOAJ importer
In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity.
Diffstat (limited to 'python/tests/files/journal_metadata.sample.json')
0 files changed, 0 insertions, 0 deletions