aboutsummaryrefslogtreecommitdiffstats
path: root/skate/testdata/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'skate/testdata/README.md')
-rw-r--r--skate/testdata/README.md63
1 files changed, 63 insertions, 0 deletions
diff --git a/skate/testdata/README.md b/skate/testdata/README.md
new file mode 100644
index 0000000..04a277e
--- /dev/null
+++ b/skate/testdata/README.md
@@ -0,0 +1,63 @@
+# Fixtures
+
+Put all documents used as inputs and output here. The wiring can happen in
+code or in separate file (for general editing).
+
+## verify.csv
+
+[This file](https://github.com/miku/fuzzycat/blob/master/tests/data/verify.csv)
+currently contains four columns:
+
+* ident
+* ident
+* match status
+* reason (optional)
+
+If you add lines to this file, the test suite will pick it up automatically.
+
+```csv
+7kzrmoajzzedxgdvbltgqihszu,bd4crw4p7ber7pzhpoyw2c77bi,Status.STRONG,OK.SLUG_TITLE_AUTHOR_MATCH
+```
+
+## Helpers
+
+Going from a query to the combination of idents (with
+[esdump](https://github.com/miku/esdump), [jq](https://stedolan.github.io/jq/),
+[makecomb.py](https://gist.github.com/miku/c1220715060babc2374a440bd742a410):
+
+```
+$ esdump -q '"Calcifying+extracellular+mucus+substances"' | \
+ jq -rC '.hits.hits[]._id' | makecomb.py | awk '{print $1","$2}'
+
+5lk635o65nc2tnkus3pkf2ggeq,hqrvhbvocvaabg6nr5p43tl3uq
+5lk635o65nc2tnkus3pkf2ggeq,zfwf3tefajc6zdxa47vgilm7wm
+hqrvhbvocvaabg6nr5p43tl3uq,zfwf3tefajc6zdxa47vgilm7wm
+```
+
+Where `makecomb.py` turns lines into pairs.
+
+```
+$ curl -sL https://git.io/JkDwC > ~/bin/makecomb.py && chmod +x ~/bin/makecomb.py
+```
+
+Short script.
+
+```python
+#!/usr/bin/env python
+import fileinput
+import itertools
+
+vs = set()
+for line in fileinput.input():
+ line = line.strip()
+ if not line:
+ continue
+ vs.add(line)
+
+for a, b in itertools.combinations(sorted(vs), r=2):
+ print("{}\t{}".format(a, b))
+```
+
+## TODO
+
+* [ ] generate md with clickable links, grouped by match status