fuzzycat - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	matching: cleanup test files	Martin Czygan	2021-12-06	1	-1/+1
\|
*	complete FuzzyReleaseMatcher refactoring	Martin Czygan	2021-12-06	1	-84/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We keep the name, since the api - "matcher.match(release)" - is the same; simplified queries; at most one query is performed against elasticsearch; parallel release retrieval from the API; optional support for release year windows; Test cases are expressed in yaml and will be auto-loaded from the specified directory; test work against the current search endpoint, which means the actual output may change on index updates; for the moment, we think this setup is relatively simple and not too unstable. about: title contrib, partial name input: > { "contribs": [ { "raw_name": "Adams" } ], "title": "digital libraries", "ext_ids": {} } release_year_padding: 1 expected: - 7rmvqtrb2jdyhcxxodihzzcugy - a2u6ougtsjcbvczou6sazsulcm - dy45vilej5diros6zmax46nm4e - exuwhhayird4fdjmmsiqpponlq - gqrj7jikezgcfpjfazhpf4e7c4 - mkmqt3453relbpuyktnmsg6hjq - t2g5sl3dgzchtnq7dynxyzje44 - t4tvenhrvzamraxrvvxivxmvga - wd3oeoi3bffknfbg2ymleqc4ja - y63a6dhrfnb7bltlxfynydbojy
*	complete migration from away from match_release_fuzzy	Martin Czygan	2021-11-16	1	-81/+1
\| \| \| \| \|	Instead, use `FuzzyReleaseMatcher.match`, which has approximately the same behavior.
*	turn "match_release_fuzzy" into a class	Martin Czygan	2021-11-16	1	-7/+116
\| \| \| \| \| \| \| \|	Goal of this refactoring was to make the matching process a bit more configurable by using a class and a cascade of queries. For a limited test set: `FuzzyReleaseMatcher.match` is works the same as `match_release_fuzzy`.
*	start larger refactoring: remove cluster	Martin Czygan	2021-09-24	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust.
*	tests: temporarily disable tests	Martin Czygan	2021-09-21	1	-12/+12
\| \| \| \| \|	We want to first move to elasticsearch dsl and will reactivate and extends after refactoring.
*	matching: run an additional es query for fuzzy matching	Martin Czygan	2021-09-21	1	-2/+20
\|
*	style: apply formatting	Martin Czygan	2021-09-21	1	-3/+12
\|
*	lint: remove unused imports	Bryan Newbold	2021-05-31	1	-1/+0
\|
*	cleanup merge artifact	Martin Czygan	2021-04-15	1	-1/+0
\|
*	Merge branch 'bnewbold-dev-setup'	Martin Czygan	2021-04-15	1	-1/+8
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* bnewbold-dev-setup: dynaconf: switch to fuzzycat.config import across project upgrade to python3.8 gitlab CI: try 'make deps' and 'make test' makefile: run common commands inside pipenv makefile: change 'deps' to be simple --dev --deploy make fmt
\| *	dynaconf: switch to fuzzycat.config import across project	Bryan Newbold	2021-04-13	1	-2/+1
\| \| \| \| \| \| \| \|	This is the recommended way to use dynaconf.
* \|	fix imports and formatting	Martin Czygan	2021-04-14	1	-5/+12
\| \|
* \|	test: skip if configured search server is not reachable	Martin Czygan	2021-04-14	1	-0/+14
\| \|
* \|	tests: run es tests against public search endpoint	Martin Czygan	2021-04-14	1	-8/+31
\|/
*	inject configuration	Martin Czygan	2020-12-23	1	-1/+5
\|
*	update reference	Martin Czygan	2020-12-16	1	-1/+1
\|
*	add skip reason	Martin Czygan	2020-12-16	1	-1/+1
\|
*	docs and release match command	Martin Czygan	2020-12-16	1	-4/+11
\|
*	matching stub	Martin Czygan	2020-12-15	1	-0/+19
\|
*	cleanup	Martin Czygan	2020-10-21	1	-4/+0
\|
*	stub: command line	Martin Czygan	2020-08-18	1	-2/+1
\|
*	tests: add stub	Martin Czygan	2020-08-17	1	-0/+5