diff options
Diffstat (limited to 'skate/testdata/README.md')
-rw-r--r-- | skate/testdata/README.md | 63 |
1 files changed, 63 insertions, 0 deletions
diff --git a/skate/testdata/README.md b/skate/testdata/README.md new file mode 100644 index 0000000..04a277e --- /dev/null +++ b/skate/testdata/README.md @@ -0,0 +1,63 @@ +# Fixtures + +Put all documents used as inputs and output here. The wiring can happen in +code or in separate file (for general editing). + +## verify.csv + +[This file](https://github.com/miku/fuzzycat/blob/master/tests/data/verify.csv) +currently contains four columns: + +* ident +* ident +* match status +* reason (optional) + +If you add lines to this file, the test suite will pick it up automatically. + +```csv +7kzrmoajzzedxgdvbltgqihszu,bd4crw4p7ber7pzhpoyw2c77bi,Status.STRONG,OK.SLUG_TITLE_AUTHOR_MATCH +``` + +## Helpers + +Going from a query to the combination of idents (with +[esdump](https://github.com/miku/esdump), [jq](https://stedolan.github.io/jq/), +[makecomb.py](https://gist.github.com/miku/c1220715060babc2374a440bd742a410): + +``` +$ esdump -q '"Calcifying+extracellular+mucus+substances"' | \ + jq -rC '.hits.hits[]._id' | makecomb.py | awk '{print $1","$2}' + +5lk635o65nc2tnkus3pkf2ggeq,hqrvhbvocvaabg6nr5p43tl3uq +5lk635o65nc2tnkus3pkf2ggeq,zfwf3tefajc6zdxa47vgilm7wm +hqrvhbvocvaabg6nr5p43tl3uq,zfwf3tefajc6zdxa47vgilm7wm +``` + +Where `makecomb.py` turns lines into pairs. + +``` +$ curl -sL https://git.io/JkDwC > ~/bin/makecomb.py && chmod +x ~/bin/makecomb.py +``` + +Short script. + +```python +#!/usr/bin/env python +import fileinput +import itertools + +vs = set() +for line in fileinput.input(): + line = line.strip() + if not line: + continue + vs.add(line) + +for a, b in itertools.combinations(sorted(vs), r=2): + print("{}\t{}".format(a, b)) +``` + +## TODO + +* [ ] generate md with clickable links, grouped by match status |