aboutsummaryrefslogtreecommitdiffstats
path: root/skate
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-05-06 00:50:47 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-05-06 00:50:50 +0200
commitef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd (patch)
treee74b639555a4a700c2121af96209cb7c82119523 /skate
parent9aa0271ffc471ffd733d0f133d90821759240d53 (diff)
downloadrefcat-ef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd.tar.gz
refcat-ef10fe07bb4dc2d6a1b4f5dd3d1d8760f59e7fbd.zip
wip: README
Diffstat (limited to 'skate')
-rw-r--r--skate/README.md26
1 files changed, 24 insertions, 2 deletions
diff --git a/skate/README.md b/skate/README.md
index 68a3f64..5501196 100644
--- a/skate/README.md
+++ b/skate/README.md
@@ -1,7 +1,7 @@
# skate
-A small library and suite of command line tools related to generating a
-[citation graph](https://en.wikipedia.org/wiki/Citation_graph).
+A library and suite of command line tools related to generating a [citation
+graph](https://en.wikipedia.org/wiki/Citation_graph).
> There is no standard format for the citations in bibliographies, and the
> record linkage of citations can be a time-consuming and complicated process.
@@ -16,6 +16,28 @@ project for performance (and we saw a 25x speedup for certain tasks).
![](static/zipkey.png)
+## Overview
+
+First, generate a "sorted key file" - for our purposes a TSV containing a key
+and the original document. Various mappers are implemented and it is relatively
+easy to add another one.
+
+```
+$ skate-map -m ts < file.jsonl | sort -k1,1 > map.tsv
+```
+
+Repeat the mapping for any file you want to compare against the catalog. Then,
+decide which *reduce* mode is desired.
+
+```
+$ skate-reduce -r bref -f file.1 -g file.2
+```
+
+Depending on what the reducer does, it can generate a verification status or
+some export schema.
+
+WIP: ...
+
## Core Utils
* `skate-derive-key`, will be: `skate-map`