diff options
-rw-r--r-- | skate/README.md | 26 |
1 files changed, 24 insertions, 2 deletions
diff --git a/skate/README.md b/skate/README.md index 68a3f64..5501196 100644 --- a/skate/README.md +++ b/skate/README.md @@ -1,7 +1,7 @@ # skate -A small library and suite of command line tools related to generating a -[citation graph](https://en.wikipedia.org/wiki/Citation_graph). +A library and suite of command line tools related to generating a [citation +graph](https://en.wikipedia.org/wiki/Citation_graph). > There is no standard format for the citations in bibliographies, and the > record linkage of citations can be a time-consuming and complicated process. @@ -16,6 +16,28 @@ project for performance (and we saw a 25x speedup for certain tasks). ![](static/zipkey.png) +## Overview + +First, generate a "sorted key file" - for our purposes a TSV containing a key +and the original document. Various mappers are implemented and it is relatively +easy to add another one. + +``` +$ skate-map -m ts < file.jsonl | sort -k1,1 > map.tsv +``` + +Repeat the mapping for any file you want to compare against the catalog. Then, +decide which *reduce* mode is desired. + +``` +$ skate-reduce -r bref -f file.1 -g file.2 +``` + +Depending on what the reducer does, it can generate a verification status or +some export schema. + +WIP: ... + ## Core Utils * `skate-derive-key`, will be: `skate-map` |