aboutsummaryrefslogtreecommitdiffstats
path: root/skate/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'skate/README.md')
-rw-r--r--skate/README.md26
1 files changed, 24 insertions, 2 deletions
diff --git a/skate/README.md b/skate/README.md
index 68a3f64..5501196 100644
--- a/skate/README.md
+++ b/skate/README.md
@@ -1,7 +1,7 @@
# skate
-A small library and suite of command line tools related to generating a
-[citation graph](https://en.wikipedia.org/wiki/Citation_graph).
+A library and suite of command line tools related to generating a [citation
+graph](https://en.wikipedia.org/wiki/Citation_graph).
> There is no standard format for the citations in bibliographies, and the
> record linkage of citations can be a time-consuming and complicated process.
@@ -16,6 +16,28 @@ project for performance (and we saw a 25x speedup for certain tasks).
![](static/zipkey.png)
+## Overview
+
+First, generate a "sorted key file" - for our purposes a TSV containing a key
+and the original document. Various mappers are implemented and it is relatively
+easy to add another one.
+
+```
+$ skate-map -m ts < file.jsonl | sort -k1,1 > map.tsv
+```
+
+Repeat the mapping for any file you want to compare against the catalog. Then,
+decide which *reduce* mode is desired.
+
+```
+$ skate-reduce -r bref -f file.1 -g file.2
+```
+
+Depending on what the reducer does, it can generate a verification status or
+some export schema.
+
+WIP: ...
+
## Core Utils
* `skate-derive-key`, will be: `skate-map`