From ef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Thu, 6 May 2021 00:50:47 +0200 Subject: wip: README --- skate/README.md | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/skate/README.md b/skate/README.md index 68a3f64..5501196 100644 --- a/skate/README.md +++ b/skate/README.md @@ -1,7 +1,7 @@ # skate -A small library and suite of command line tools related to generating a -[citation graph](https://en.wikipedia.org/wiki/Citation_graph). +A library and suite of command line tools related to generating a [citation +graph](https://en.wikipedia.org/wiki/Citation_graph). > There is no standard format for the citations in bibliographies, and the > record linkage of citations can be a time-consuming and complicated process. @@ -16,6 +16,28 @@ project for performance (and we saw a 25x speedup for certain tasks). ![](static/zipkey.png) +## Overview + +First, generate a "sorted key file" - for our purposes a TSV containing a key +and the original document. Various mappers are implemented and it is relatively +easy to add another one. + +``` +$ skate-map -m ts < file.jsonl | sort -k1,1 > map.tsv +``` + +Repeat the mapping for any file you want to compare against the catalog. Then, +decide which *reduce* mode is desired. + +``` +$ skate-reduce -r bref -f file.1 -g file.2 +``` + +Depending on what the reducer does, it can generate a verification status or +some export schema. + +WIP: ... + ## Core Utils * `skate-derive-key`, will be: `skate-map` -- cgit v1.2.3