aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-02-11 13:19:11 +0100
committerMartin Czygan <martin.czygan@gmail.com>2021-02-11 13:19:11 +0100
commite75a77fdedae4a4a37c5ddc12c796c70164900dc (patch)
tree2a2020c2c6b34316992c88115af005f3d064a04e
parent622f56b066316b0f16a9cb087040ee7acaaecaeb (diff)
downloadfuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.tar.gz
fuzzycat-e75a77fdedae4a4a37c5ddc12c796c70164900dc.zip
update notes
-rw-r--r--README.md7
1 files changed, 7 insertions, 0 deletions
diff --git a/README.md b/README.md
index db4d3ed..7957d4a 100644
--- a/README.md
+++ b/README.md
@@ -219,3 +219,10 @@ $ cat data/sample.json | parallel -j 8 --pipe --roundrobin python -m fuzzycat.ma
Interestingly, the parallel variants detects fewer clusters (because data is
split and clusters are searched within each batch). TODO(miku): sort out sharding bug.
+
+# Notes on Refs
+
+* technique from fuzzycat ported in parts to
+ [skate](https://github.com/miku/skate) - to go from refs and release dataset
+to a number of clusters, relating references to releases
+* need to verify, but not the references against each other, only refs againt the release