<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/python/tests, branch v0.3.3</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=v0.3.3</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=v0.3.3'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2020-12-18T07:03:08+00:00</updated>
<entry>
<title>improve dblp release import</title>
<updated>2020-12-18T07:03:08+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-17T09:56:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=9451b3063c2d446748db74027c40c13ee69c24fb'/>
<id>urn:sha1:9451b3063c2d446748db74027c40c13ee69c24fb</id>
<content type='text'>
</content>
</entry>
<entry>
<title>very simple dblp container importer</title>
<updated>2020-12-18T07:03:08+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-17T09:55:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=58ff361eb481bee9d2ef7249f48f94729d2a830d'/>
<id>urn:sha1:58ff361eb481bee9d2ef7249f48f94729d2a830d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>basic test coverage of dblp release importer</title>
<updated>2020-12-18T07:03:08+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-02T19:30:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=c66f9b2d98de88a98d3a1737d415bdab4e89027c'/>
<id>urn:sha1:c66f9b2d98de88a98d3a1737d415bdab4e89027c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>add 'lxml' mode for large XML file import, and multi-tags</title>
<updated>2020-12-18T07:03:08+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-02T18:49:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=4e332e9037530ebc62836acfa78896dc76700c9c'/>
<id>urn:sha1:4e332e9037530ebc62836acfa78896dc76700c9c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>fix sloppy is_preserved ES transfom test failure</title>
<updated>2020-12-18T07:03:00+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-18T07:03:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=53b17b96c61ecfe14a30da523e993a9902d7f375'/>
<id>urn:sha1:53b17b96c61ecfe14a30da523e993a9902d7f375</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge branch 'bnewbold-doaj-fuzzy' into 'master'</title>
<updated>2020-12-18T02:13:47+00:00</updated>
<author>
<name>bnewbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-12-18T02:13:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=443243e8cccba3e779b7c56d0cdb6dcd992a3100'/>
<id>urn:sha1:443243e8cccba3e779b7c56d0cdb6dcd992a3100</id>
<content type='text'>
DOAJ import fuzzy match filter

See merge request webgroup/fatcat!92</content>
</entry>
<entry>
<title>update fuzzy helper to pass 'reason' through to import code</title>
<updated>2020-12-18T00:01:06+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-18T00:01:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=5eeb7a9d61beb8cb40fd89bd91fcd9dd820035aa'/>
<id>urn:sha1:5eeb7a9d61beb8cb40fd89bd91fcd9dd820035aa</id>
<content type='text'>
The motivation for this change is to enable passing the 'reason' through
to edit extra metadata, in cases where we merge or cluster releases.
</content>
</entry>
<entry>
<title>add fuzzy match filtering to DOAJ importer</title>
<updated>2020-12-17T04:16:09+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-17T03:56:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=6d5811693c36b9e73dedf0205c40f2aed63e2870'/>
<id>urn:sha1:6d5811693c36b9e73dedf0205c40f2aed63e2870</id>
<content type='text'>
In this default configuration, any entities with a fuzzy match (even
"ambiguous") will be skipped at import time, to prevent creating
duplicates. This is conservative towards not creating new/duplicate
entities.

In the future, as we get more confidence in fuzzy match/verification, we
can start to ignore AMBIGUOUS, handle EXACT as same release, and merge
STRONG (and WEAK?) matches under the same work entity.
</content>
</entry>
<entry>
<title>add fuzzy matching helper to importer base class</title>
<updated>2020-12-17T04:16:09+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-17T03:54:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=38328c25674fee7781a8d8601e1d47de04186f19'/>
<id>urn:sha1:38328c25674fee7781a8d8601e1d47de04186f19</id>
<content type='text'>
Using fuzzycat. Add basic test coverage.
</content>
</entry>
<entry>
<title>improve release elasticsearch transform test coverage</title>
<updated>2020-12-16T22:33:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-12-16T22:33:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=ebcc86561dabf3974ca11151445e66c0df4431f1'/>
<id>urn:sha1:ebcc86561dabf3974ca11151445e66c0df4431f1</id>
<content type='text'>
</content>
</entry>
</feed>
