<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fuzzycat/notes, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.bnewbold.net/fuzzycat/atom?h=master</id>
<link rel='self' href='https://git.bnewbold.net/fuzzycat/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/'/>
<updated>2021-12-06T18:53:30+00:00</updated>
<entry>
<title>complete FuzzyReleaseMatcher refactoring</title>
<updated>2021-12-06T18:53:30+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-11-17T13:51:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=dd6149140542585f2b0bfc3b334ec2b0a88b790e'/>
<id>urn:sha1:dd6149140542585f2b0bfc3b334ec2b0a88b790e</id>
<content type='text'>
We keep the name, since the api - "matcher.match(release)" - is the
same; simplified queries; at most one query is performed against
elasticsearch; parallel release retrieval from the API; optional support
for release year windows;

Test cases are expressed in yaml and will be auto-loaded from the
specified directory; test work against the current search endpoint,
which means the actual output may change on index updates; for the
moment, we think this setup is relatively simple and not too unstable.

    about: title contrib, partial name
    input: &gt;
      {
        "contribs": [
          {
            "raw_name": "Adams"
          }
        ],
        "title": "digital libraries",
        "ext_ids": {}
      }
    release_year_padding: 1
    expected:
      - 7rmvqtrb2jdyhcxxodihzzcugy
      - a2u6ougtsjcbvczou6sazsulcm
      - dy45vilej5diros6zmax46nm4e
      - exuwhhayird4fdjmmsiqpponlq
      - gqrj7jikezgcfpjfazhpf4e7c4
      - mkmqt3453relbpuyktnmsg6hjq
      - t2g5sl3dgzchtnq7dynxyzje44
      - t4tvenhrvzamraxrvvxivxmvga
      - wd3oeoi3bffknfbg2ymleqc4ja
      - y63a6dhrfnb7bltlxfynydbojy
</content>
</entry>
<entry>
<title>turn "match_release_fuzzy" into a class</title>
<updated>2021-11-16T17:58:42+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-11-05T16:19:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=0c84af603894049dd8edd95da18d8990ab0516d1'/>
<id>urn:sha1:0c84af603894049dd8edd95da18d8990ab0516d1</id>
<content type='text'>
Goal of this refactoring was to make the matching process a bit more
configurable by using a class and a cascade of queries.

For a limited test set: `FuzzyReleaseMatcher.match` is works the same as
`match_release_fuzzy`.
</content>
</entry>
<entry>
<title>reorganize notes</title>
<updated>2021-09-21T13:55:11+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-09-21T13:55:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=6a224c316869ba2651094ad47e1d92e102524f85'/>
<id>urn:sha1:6a224c316869ba2651094ad47e1d92e102524f85</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge branch 'master' of git.archive.org:webgroup/fuzzycat</title>
<updated>2021-07-09T11:26:35+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-07-09T11:26:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=002764b5b1f8f27bd8ae42d33b2a6f42a2a4b7a1'/>
<id>urn:sha1:002764b5b1f8f27bd8ae42d33b2a6f42a2a4b7a1</id>
<content type='text'>
* 'master' of git.archive.org:webgroup/fuzzycat:
  simplify README for general audience; move some content to notes
  sandcrawler slugify: lower-case greek ambiguity (OCR)
  DOI clean/normalize helper; and use in verification etc
  verify: page count parsing and comparison improvements
</content>
</entry>
<entry>
<title>notes on matching metrics</title>
<updated>2021-07-08T15:48:39+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-07-08T15:48:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=6a97a067aa967d681c112f7d2ea1e02e038189ee'/>
<id>urn:sha1:6a97a067aa967d681c112f7d2ea1e02e038189ee</id>
<content type='text'>
</content>
</entry>
<entry>
<title>cleanup notes</title>
<updated>2021-07-08T14:44:50+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-07-08T14:44:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=1d84fec0927a98e576a6525911252d334f0da48a'/>
<id>urn:sha1:1d84fec0927a98e576a6525911252d334f0da48a</id>
<content type='text'>
</content>
</entry>
<entry>
<title>simplify README for general audience; move some content to notes</title>
<updated>2021-07-02T01:06:47+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-07-02T01:06:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=623c5a841b351a21e82a9752d15154da3ab5a635'/>
<id>urn:sha1:623c5a841b351a21e82a9752d15154da3ab5a635</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update diagram</title>
<updated>2020-12-24T11:01:28+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-12-24T11:01:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=35a10a97efa58ac57d59ce6595555cd274ec6e80'/>
<id>urn:sha1:35a10a97efa58ac57d59ce6595555cd274ec6e80</id>
<content type='text'>
</content>
</entry>
<entry>
<title>add case: article, erratum</title>
<updated>2020-12-23T15:18:55+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-12-23T15:18:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=b5bd7fcfa6877f44ef7a6d40b0fe779a9937fec9'/>
<id>urn:sha1:b5bd7fcfa6877f44ef7a6d40b0fe779a9937fec9</id>
<content type='text'>
</content>
</entry>
<entry>
<title>fix color</title>
<updated>2020-12-18T02:07:28+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-12-18T02:07:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fuzzycat/commit/?id=9ce808e71367ad294e8f53480864cf7531e1df9f'/>
<id>urn:sha1:9ce808e71367ad294e8f53480864cf7531e1df9f</id>
<content type='text'>
</content>
</entry>
</feed>
