| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
background: verifying hundreds of millions of documents turned out to be
a bit slow; anecdata: running clustering and verification over 1.8B
inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for
those operations. Also: with Go we do not need the extra GNU parallel
wrapping.
In any case, we aim for fuzzycat refactoring to provide:
* better, more configurable verification and small scale matching
* removal of batch clustering code (and improve refcat docs)
* a place for a bit more generic, similarity based utils
The most important piece in fuzzycat is a CSV file containing hand
picked test examples for verification - and the code that is able to
fulfill that test suite. We want to make this part more robust.
|
|
|
|
|
| |
We want to first move to elasticsearch dsl and will reactivate and
extends after refactoring.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* bnewbold-dev-setup:
dynaconf: switch to fuzzycat.config import across project
upgrade to python3.8
gitlab CI: try 'make deps' and 'make test'
makefile: run common commands inside pipenv
makefile: change 'deps' to be simple --dev --deploy
make fmt
|
| |
| |
| |
| | |
This is the recommended way to use dynaconf.
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|