aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--skate/README.md2
-rw-r--r--skate/cmd/skate-map/main.go16
2 files changed, 8 insertions, 10 deletions
diff --git a/skate/README.md b/skate/README.md
index 8e2d7d1..68a3f64 100644
--- a/skate/README.md
+++ b/skate/README.md
@@ -18,7 +18,7 @@ project for performance (and we saw a 25x speedup for certain tasks).
## Core Utils
-* `skate-derive-key`, `skate-map`
+* `skate-derive-key`, will be: `skate-map`
* `skate-cluster`
* `skate-verify-*`
diff --git a/skate/cmd/skate-map/main.go b/skate/cmd/skate-map/main.go
index 67fc62b..d5f22fd 100644
--- a/skate/cmd/skate-map/main.go
+++ b/skate/cmd/skate-map/main.go
@@ -1,13 +1,10 @@
-// skate-map runs a given map function over input data. We mostly want to
+// skate-map runs a given "map" function over input data. Here, we mostly want to
// extract a key from a json document. For simple cases, you can use `jq` and
-// other tools. Some key derivations require a bit more.
+// other tools. Some key derivations require a bit more, hence a dedicated program.
//
-// This tool helps us to find similar things in billions of items by mapping
-// docs to key. All docs that share a key are considered match candidates and can be
-// post-processed, e.g. to verify matches or to generate output schemas.
-//
-// An example with mostly unix tools. We want to extract the DOI and sort by
-// it; we also want to do this fast, hence parallel, LC_ALL, etc.
+// An example with mostly unix tools. We want to extract the DOI from newline
+// delimited JSON and sort by it; we also want to do this fast, hence parallel,
+// LC_ALL, etc.
//
// $ zstdcat -T0 file.zst | (1)
// LC_ALL=C tr -d '\t' | (2) *
@@ -32,7 +29,8 @@
// (9) sorting by DOI
//
// This is reasonably fast, but some cleanup is ugly. We also want more complex
-// keys, e.g. more normalizations, etc. We'd like to encapsulate (2) to (8).
+// keys, e.g. more normalizations, etc; in short: we'd like to encapsulate (2)
+// to (8) with `skate-map`.
package main
import (