diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-02-11 13:19:11 +0100 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-02-11 13:19:11 +0100 |
commit | e75a77fdedae4a4a37c5ddc12c796c70164900dc (patch) | |
tree | 2a2020c2c6b34316992c88115af005f3d064a04e | |
parent | 622f56b066316b0f16a9cb087040ee7acaaecaeb (diff) | |
download | fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.tar.gz fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.zip |
update notes
-rw-r--r-- | README.md | 7 |
1 files changed, 7 insertions, 0 deletions
@@ -219,3 +219,10 @@ $ cat data/sample.json | parallel -j 8 --pipe --roundrobin python -m fuzzycat.ma Interestingly, the parallel variants detects fewer clusters (because data is split and clusters are searched within each batch). TODO(miku): sort out sharding bug. + +# Notes on Refs + +* technique from fuzzycat ported in parts to + [skate](https://github.com/miku/skate) - to go from refs and release dataset +to a number of clusters, relating references to releases +* need to verify, but not the references against each other, only refs againt the release |