diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2020-08-12 14:20:41 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2020-08-12 14:24:31 +0200 |
commit | 5a307829670888fedd696e6220c84feed1fe6b64 (patch) | |
tree | 87e5046442ad95239c1f60982a191ceb1d8b1c9f /setup.py | |
parent | f96c3d0d025ad37836eb908d561b0c607a1f7b5e (diff) | |
download | fuzzycat-5a307829670888fedd696e6220c84feed1fe6b64.tar.gz fuzzycat-5a307829670888fedd696e6220c84feed1fe6b64.zip |
stub tool: fuzzycat-issn to generate test data
currently: fuzzycat-issn --make-pairs will generate a TSV with (issn, a, b) example, e.g.
...
0011-9717 Detskaâ literatura. Детская литература.
0011-9717 Detskaâ literatura. Detskaâ literatura
0011-9717 Детская литература. Detskaâ literatura
0011-6637 Darbininkas. Darbininkas
0012-0820 deutsche Tabakbau deutsche Tabakbau.
0011-5444 Daily Kent stater. Daily Kent stater
...
The idea is that these names per definition denote the same journal. We
might even have a fixed lookup table, since some variants involve
multiple scripts (and there are only around 2M names in total).
Currently 1992176 pairs can be generated.
Diffstat (limited to 'setup.py')
0 files changed, 0 insertions, 0 deletions