update notes

author: Martin Czygan <martin.czygan@gmail.com> 2021-02-11 13:19:11 +0100
committer: Martin Czygan <martin.czygan@gmail.com> 2021-02-11 13:19:11 +0100
commit: e75a77fdedae4a4a37c5ddc12c796c70164900dc (patch)
tree: 2a2020c2c6b34316992c88115af005f3d064a04e
parent: 622f56b066316b0f16a9cb087040ee7acaaecaeb (diff)
download: fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.tar.gz
fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.zip
1 files changed, 7 insertions, 0 deletions
diff --git a/README.md b/README.md
index db4d3ed..7957d4a 100644
--- a/README.md
+++ b/README.md
@@ -219,3 +219,10 @@ $ cat data/sample.json | parallel -j 8 --pipe --roundrobin python -m fuzzycat.ma
 
 Interestingly, the parallel variants detects fewer clusters (because data is
 split and clusters are searched within each batch). TODO(miku): sort out sharding bug.
+
+# Notes on Refs
+
+* technique from fuzzycat ported in parts to
+  [skate](https://github.com/miku/skate) - to go from refs and release dataset
+to a number of clusters, relating references to releases
+* need to verify, but not the references against each other, only refs againt the release
author	Martin Czygan <martin.czygan@gmail.com>	2021-02-11 13:19:11 +0100
committer	Martin Czygan <martin.czygan@gmail.com>	2021-02-11 13:19:11 +0100
commit	e75a77fdedae4a4a37c5ddc12c796c70164900dc (patch)
tree	2a2020c2c6b34316992c88115af005f3d064a04e
parent	622f56b066316b0f16a9cb087040ee7acaaecaeb (diff)
download	fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.tar.gz fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.zip