aboutsummaryrefslogtreecommitdiffstats
path: root/skate/cmd/skate-map
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-05-04 22:48:47 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-05-04 22:48:47 +0200
commit3a43e67238f5acc96a36265f78b70425d078d579 (patch)
tree7300bf8ba842d3064d6f315663fab7c2db388286 /skate/cmd/skate-map
parent223d1d5ba445c38c287da43c0599d2b2b03ecd87 (diff)
downloadrefcat-3a43e67238f5acc96a36265f78b70425d078d579.tar.gz
refcat-3a43e67238f5acc96a36265f78b70425d078d579.zip
update docs
Diffstat (limited to 'skate/cmd/skate-map')
-rw-r--r--skate/cmd/skate-map/main.go16
1 files changed, 7 insertions, 9 deletions
diff --git a/skate/cmd/skate-map/main.go b/skate/cmd/skate-map/main.go
index 67fc62b..d5f22fd 100644
--- a/skate/cmd/skate-map/main.go
+++ b/skate/cmd/skate-map/main.go
@@ -1,13 +1,10 @@
-// skate-map runs a given map function over input data. We mostly want to
+// skate-map runs a given "map" function over input data. Here, we mostly want to
// extract a key from a json document. For simple cases, you can use `jq` and
-// other tools. Some key derivations require a bit more.
+// other tools. Some key derivations require a bit more, hence a dedicated program.
//
-// This tool helps us to find similar things in billions of items by mapping
-// docs to key. All docs that share a key are considered match candidates and can be
-// post-processed, e.g. to verify matches or to generate output schemas.
-//
-// An example with mostly unix tools. We want to extract the DOI and sort by
-// it; we also want to do this fast, hence parallel, LC_ALL, etc.
+// An example with mostly unix tools. We want to extract the DOI from newline
+// delimited JSON and sort by it; we also want to do this fast, hence parallel,
+// LC_ALL, etc.
//
// $ zstdcat -T0 file.zst | (1)
// LC_ALL=C tr -d '\t' | (2) *
@@ -32,7 +29,8 @@
// (9) sorting by DOI
//
// This is reasonably fast, but some cleanup is ugly. We also want more complex
-// keys, e.g. more normalizations, etc. We'd like to encapsulate (2) to (8).
+// keys, e.g. more normalizations, etc; in short: we'd like to encapsulate (2)
+// to (8) with `skate-map`.
package main
import (