aboutsummaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2020-10-31 00:50:48 +0100
committerMartin Czygan <martin.czygan@gmail.com>2020-10-31 00:50:48 +0100
commitfa6b97a412b99350d5bd1c53032dc59de33a5c43 (patch)
tree5798b4bd60d38607bd41eeea6391755a90fdee95 /notes
parent62c1e4bf7ae2e3c959aba4cce0988eff043a7441 (diff)
downloadfuzzycat-fa6b97a412b99350d5bd1c53032dc59de33a5c43.tar.gz
fuzzycat-fa6b97a412b99350d5bd1c53032dc59de33a5c43.zip
note on workflow
Diffstat (limited to 'notes')
-rw-r--r--notes/workflow.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/notes/workflow.md b/notes/workflow.md
index abf0d76..04ceb02 100644
--- a/notes/workflow.md
+++ b/notes/workflow.md
@@ -24,7 +24,7 @@ The output could be a TSV file, with method and then release identifiers.
rawt o3utonw5qzhddo7l4lmwptgeey nnpmnwln7be2zb5hd2qanq3r7q
```
-Or jsonlines for a bit of structure.
+Or jsonlines for a bit of structure (e.g. method, ids)
```
{"m": "rawt", "c": ["o3utonw5qzhddo7l4lmwptgeey", "nnpmnwln7be2zb5hd2qanq3r7q"]}
@@ -43,8 +43,8 @@ $ zstdcat -T0 release_export_expanded.json.zst | fuzzycat-cluster -g > clusters.
There will be various methods by which to examine the cluster as well.
-We need to fetch releases by identifier, this can be the full record or some
-partial record that has been cached somewhere.
+We need to fetch releases by identifier (API, but use "hide"), this can be the
+full record or some partial record that has been cached somewhere.
The input is then a list of releases and the output would be a equally sized or
smaller cluster of releases which we assume represent the same record.