From e75a77fdedae4a4a37c5ddc12c796c70164900dc Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Thu, 11 Feb 2021 13:19:11 +0100 Subject: update notes --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index db4d3ed..7957d4a 100644 --- a/README.md +++ b/README.md @@ -219,3 +219,10 @@ $ cat data/sample.json | parallel -j 8 --pipe --roundrobin python -m fuzzycat.ma Interestingly, the parallel variants detects fewer clusters (because data is split and clusters are searched within each batch). TODO(miku): sort out sharding bug. + +# Notes on Refs + +* technique from fuzzycat ported in parts to + [skate](https://github.com/miku/skate) - to go from refs and release dataset +to a number of clusters, relating references to releases +* need to verify, but not the references against each other, only refs againt the release -- cgit v1.2.3