aboutsummaryrefslogtreecommitdiffstats
path: root/fuzzycat/utils.py
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2020-08-12 14:20:41 +0200
committerMartin Czygan <martin.czygan@gmail.com>2020-08-12 14:24:31 +0200
commit5a307829670888fedd696e6220c84feed1fe6b64 (patch)
tree87e5046442ad95239c1f60982a191ceb1d8b1c9f /fuzzycat/utils.py
parentf96c3d0d025ad37836eb908d561b0c607a1f7b5e (diff)
downloadfuzzycat-5a307829670888fedd696e6220c84feed1fe6b64.tar.gz
fuzzycat-5a307829670888fedd696e6220c84feed1fe6b64.zip
stub tool: fuzzycat-issn to generate test data
currently: fuzzycat-issn --make-pairs will generate a TSV with (issn, a, b) example, e.g. ... 0011-9717 Detskaâ literatura. Детская литература. 0011-9717 Detskaâ literatura. Detskaâ literatura 0011-9717 Детская литература. Detskaâ literatura 0011-6637 Darbininkas. Darbininkas 0012-0820 deutsche Tabakbau deutsche Tabakbau. 0011-5444 Daily Kent stater. Daily Kent stater ... The idea is that these names per definition denote the same journal. We might even have a fixed lookup table, since some variants involve multiple scripts (and there are only around 2M names in total). Currently 1992176 pairs can be generated.
Diffstat (limited to 'fuzzycat/utils.py')
0 files changed, 0 insertions, 0 deletions