diff options
-rw-r--r-- | skate/README.md | 2 | ||||
-rw-r--r-- | skate/cmd/skate-map/main.go | 16 |
2 files changed, 8 insertions, 10 deletions
diff --git a/skate/README.md b/skate/README.md index 8e2d7d1..68a3f64 100644 --- a/skate/README.md +++ b/skate/README.md @@ -18,7 +18,7 @@ project for performance (and we saw a 25x speedup for certain tasks). ## Core Utils -* `skate-derive-key`, `skate-map` +* `skate-derive-key`, will be: `skate-map` * `skate-cluster` * `skate-verify-*` diff --git a/skate/cmd/skate-map/main.go b/skate/cmd/skate-map/main.go index 67fc62b..d5f22fd 100644 --- a/skate/cmd/skate-map/main.go +++ b/skate/cmd/skate-map/main.go @@ -1,13 +1,10 @@ -// skate-map runs a given map function over input data. We mostly want to +// skate-map runs a given "map" function over input data. Here, we mostly want to // extract a key from a json document. For simple cases, you can use `jq` and -// other tools. Some key derivations require a bit more. +// other tools. Some key derivations require a bit more, hence a dedicated program. // -// This tool helps us to find similar things in billions of items by mapping -// docs to key. All docs that share a key are considered match candidates and can be -// post-processed, e.g. to verify matches or to generate output schemas. -// -// An example with mostly unix tools. We want to extract the DOI and sort by -// it; we also want to do this fast, hence parallel, LC_ALL, etc. +// An example with mostly unix tools. We want to extract the DOI from newline +// delimited JSON and sort by it; we also want to do this fast, hence parallel, +// LC_ALL, etc. // // $ zstdcat -T0 file.zst | (1) // LC_ALL=C tr -d '\t' | (2) * @@ -32,7 +29,8 @@ // (9) sorting by DOI // // This is reasonably fast, but some cleanup is ugly. We also want more complex -// keys, e.g. more normalizations, etc. We'd like to encapsulate (2) to (8). +// keys, e.g. more normalizations, etc; in short: we'd like to encapsulate (2) +// to (8) with `skate-map`. package main import ( |