From 6628731b1531435ceb4151ed87cf483ee3134119 Mon Sep 17 00:00:00 2001
From: Martin Czygan <martin.czygan@gmail.com>
Date: Fri, 30 Apr 2021 18:34:00 +0200
Subject: wip: update README

---
 skate/README.md | 84 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 44 insertions(+), 40 deletions(-)

diff --git a/skate/README.md b/skate/README.md
index 11f294b..8c05c67 100644
--- a/skate/README.md
+++ b/skate/README.md
@@ -1,35 +1,48 @@
 # skate
 
-This suite of command line tools have been written for various parts of the
-citation graph pipeline.
+This a small library and suite of command line tools related to generating a
+citation graph.
+
+## Why?
 
 Python was a bit too slow, even when parallelized, e.g. for generating clusters
 of similar documents or to do verification. An option for the future would be
 to resort to [Cython](https://cython.org/). Parts of
-[fuzzycat](https://git.archive.org/webgroup/fuzzycat) has been ported to Go for
-performance.
+[fuzzycat](https://git.archive.org/webgroup/fuzzycat) has been ported into this
+project for performance.
 
 ![](static/zipkey.png)
 
-## Tools
+## Core Utils
+
+* `skate-derive-key`, `skate-map`
+* `skate-cluster`
+* `skate-verify-*`
 
-### skate-wikipedia-doi
 
-TSV (page title, DOI, doc) from wikipedia refs.
+The `skate-derive-key` tool derives a key from release entity JSON documents.
 
 ```
-$ parquet-tools cat --json minimal_dataset.parquet | skate-wikipedia-doi
-Rational point  10.1515/crll.1988.386.32        {"type_of_citation" ...
-Cubic surface   10.2140/ant.2007.1.393          {"type_of_citation" ...
+$ skate-derive-key < release_entities.jsonlines > docs.tsv
+```
+
+Result will be a three column TSV (ident, key, doc).
+
 ```
+---- ident --------------- ---- key --------- ---- doc ----------
 
-### skate-bref-id
+4lzgf5wzljcptlebhyobccj7ru 2568diamagneticsus {"abstracts":[],...
+```
 
-Temporary helper to add a key to a biblioref document.
+After this step:
 
-### skate-cluster
+* sort by key, e.g. `LC_ALL=C sort -k2,2 -S 35% --parallel 6 --compress-program pzstd ...`
+* cluster, e.g. `skate-cluster ...`
 
-Converts a sorted key output into a jsonlines clusters.
+----
+
+The `skate-cluster` tool converts a sorted key output into a jsonlines
+clusters.
 
 For example, this:
 
@@ -42,46 +55,37 @@ would turn into (a single line containing all docs with the same key).
 
 A single line cluster is easier to parallelize (e.g. for verification, etc.).
 
-### skate-derive-key
+----
 
-skate-derive-key derives a key from release entity JSON documents.
+The `skate-verify-*` tools run various matching and verification algorithms.
 
-```
-$ skate-derive-key < release_entities.jsonlines > docs.tsv
-```
+## Extra
 
-Result will be a three column TSV (ident, key, doc).
+* skate-wikipedia-doi
 
-```
----- ident --------------- ---- key --------- ---- doc ----------
+> TSV (page title, DOI, doc) from wikipedia refs.
 
-4lzgf5wzljcptlebhyobccj7ru 2568diamagneticsus {"abstracts":[],...
+```
+$ parquet-tools cat --json minimal_dataset.parquet | skate-wikipedia-doi
+Rational point  10.1515/crll.1988.386.32        {"type_of_citation" ...
+Cubic surface   10.2140/ant.2007.1.393          {"type_of_citation" ...
 ```
 
-After this step:
-
-* sort by key, e.g. `LC_ALL=C sort -k2,2 -S 35% --parallel 6 --compress-program pzstd ...`
-* cluster, e.g. `skate-cluster ...`
-
-### skate-from-unstructured
-
-Takes a refs file and plucks out identifiers from unstructured field.
-
-### skate-ref-to-release
+* skate-bref-id
 
-Converts a ref document to a release. Part of first run, merging refs and releases.
+> Temporary helper to add a key to a biblioref document.
 
-### skate-to-doi
+* skate-from-unstructured
 
-Sanitize DOI in tabular file.
+> Takes a refs file and plucks out identifiers from unstructured field.
 
-### skate-verify
+* skate-ref-to-release
 
-Run various matching and verification algorithms.
+> Converts a ref document to a release. Part of first run, merging refs and releases.
 
-### skate-map
+* skate-to-doi
 
-A more generic version of derive key.
+> Sanitize DOI in tabular file.
 
 ## Misc
 
-- 
cgit v1.2.3