diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-05-06 00:50:47 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-05-06 00:50:50 +0200 |
commit | ef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd (patch) | |
tree | e74b639555a4a700c2121af96209cb7c82119523 /skate | |
parent | 9aa0271ffc471ffd733d0f133d90821759240d53 (diff) | |
download | refcat-ef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd.tar.gz refcat-ef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd.zip |
wip: README
Diffstat (limited to 'skate')
-rw-r--r-- | skate/README.md | 26 |
1 files changed, 24 insertions, 2 deletions
diff --git a/skate/README.md b/skate/README.md index 68a3f64..5501196 100644 --- a/skate/README.md +++ b/skate/README.md @@ -1,7 +1,7 @@ # skate -A small library and suite of command line tools related to generating a -[citation graph](https://en.wikipedia.org/wiki/Citation_graph). +A library and suite of command line tools related to generating a [citation +graph](https://en.wikipedia.org/wiki/Citation_graph). > There is no standard format for the citations in bibliographies, and the > record linkage of citations can be a time-consuming and complicated process. @@ -16,6 +16,28 @@ project for performance (and we saw a 25x speedup for certain tasks). ![](static/zipkey.png) +## Overview + +First, generate a "sorted key file" - for our purposes a TSV containing a key +and the original document. Various mappers are implemented and it is relatively +easy to add another one. + +``` +$ skate-map -m ts < file.jsonl | sort -k1,1 > map.tsv +``` + +Repeat the mapping for any file you want to compare against the catalog. Then, +decide which *reduce* mode is desired. + +``` +$ skate-reduce -r bref -f file.1 -g file.2 +``` + +Depending on what the reducer does, it can generate a verification status or +some export schema. + +WIP: ... + ## Core Utils * `skate-derive-key`, will be: `skate-map` |