diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2020-12-18 03:12:05 +0100 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2020-12-18 03:12:05 +0100 |
commit | 5bd9eba35a9697e0cf2ac4b53d99a0112d038803 (patch) | |
tree | cc6bb7ae4f45709d04ed1db8c3d85322a9ef9f4f | |
parent | e4b37ea5bf0e3b2294f6f996c42e844524e2c0f2 (diff) | |
download | fuzzycat-5bd9eba35a9697e0cf2ac4b53d99a0112d038803.tar.gz fuzzycat-5bd9eba35a9697e0cf2ac4b53d99a0112d038803.zip |
link to sort
-rw-r--r-- | README.md | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -34,7 +34,7 @@ $ python -m fuzzycat cluster -t tsandcrawler < data/re.json > cluster.json.zst Clustering works in a three step process: 1. key extraction for each document (choose algorithm) -2. sorting by keys (via GNU sort) +2. sorting by keys (via [GNU sort](https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html)) 3. group by key and write out ([itertools.groupby](https://docs.python.org/3/library/itertools.html#itertools.groupby)) ### Verification |