Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | update notebook | Martin Czygan | 2020-08-12 | 1 | -86/+729 | |
| | ||||||
* | update README | Martin Czygan | 2020-08-12 | 1 | -1/+3 | |
| | ||||||
* | add journal name notebook | Martin Czygan | 2020-08-12 | 4 | -0/+16016 | |
| | ||||||
* | add deps for notebooks | Martin Czygan | 2020-08-12 | 1 | -4/+6 | |
| | ||||||
* | update setup.py | Martin Czygan | 2020-08-12 | 1 | -2/+10 | |
| | ||||||
* | note on optimization: marisa-trie | Martin Czygan | 2020-08-12 | 1 | -0/+1 | |
| | | | | | | | | | | Currently, the JSON mapping is 172M, turning this into a dict takes a bit, plus consumes GBs of memory. For exact lookups, we might want to use marisa-trie: > String data in a MARISA-trie may take up to 50x-100x less memory than in a standard Python dict; the raw lookup speed is comparable; trie also provides fast advanced methods like prefix search. | |||||
* | update Makefile | Martin Czygan | 2020-08-12 | 1 | -8/+12 | |
| | ||||||
* | issn: generate a name to issn mapping | Martin Czygan | 2020-08-12 | 2 | -31/+88 | |
| | | | | | | | | | | This allows to make suggestions about potentially ambiguous titles. Maybe suggest a minimal length. Ultimately, there are only about 2M journal titles. If an arbitrary string must match a journal title (not a generic container title), then we can use a combination of direct lookup; plus some extra processing based on this dataset. | |||||
* | stub tool: fuzzycat-issn to generate test data | Martin Czygan | 2020-08-12 | 1 | -0/+69 | |
| | | | | | | | | | | | | | | | | | | | currently: fuzzycat-issn --make-pairs will generate a TSV with (issn, a, b) example, e.g. ... 0011-9717 Detskaâ literatura. Детская литература. 0011-9717 Detskaâ literatura. Detskaâ literatura 0011-9717 Детская литература. Detskaâ literatura 0011-6637 Darbininkas. Darbininkas 0012-0820 deutsche Tabakbau deutsche Tabakbau. 0011-5444 Daily Kent stater. Daily Kent stater ... The idea is that these names per definition denote the same journal. We might even have a fixed lookup table, since some variants involve multiple scripts (and there are only around 2M names in total). Currently 1992176 pairs can be generated. | |||||
* | adjust formatting | Martin Czygan | 2020-08-12 | 2 | -2/+6 | |
| | ||||||
* | fix imports | Martin Czygan | 2020-08-12 | 2 | -2/+2 | |
| | ||||||
* | update README | Martin Czygan | 2020-08-12 | 1 | -2/+4 | |
| | ||||||
* | yapf: reduce column limit | Martin Czygan | 2020-08-12 | 1 | -1/+1 | |
| | ||||||
* | improve docs and imports | Martin Czygan | 2020-08-12 | 1 | -9/+8 | |
| | ||||||
* | try: all matching methods should start with match | Martin Czygan | 2020-08-12 | 2 | -2/+2 | |
| | ||||||
* | makefile: add container export download | Martin Czygan | 2020-08-12 | 1 | -1/+6 | |
| | ||||||
* | add matching submodule | Martin Czygan | 2020-08-12 | 2 | -0/+149 | |
| | ||||||
* | add deps: ftfy, unidecode, ipython | Martin Czygan | 2020-08-12 | 1 | -1/+3 | |
| | ||||||
* | add notes/todo | Martin Czygan | 2020-08-12 | 1 | -0/+17 | |
| | ||||||
* | makefile: fix typo | Martin Czygan | 2020-08-12 | 1 | -1/+1 | |
| | ||||||
* | add coverage dependency | Martin Czygan | 2020-08-12 | 2 | -7/+12 | |
| | ||||||
* | setup: require fatcat-openapi-client | Martin Czygan | 2020-08-12 | 1 | -1/+3 | |
| | ||||||
* | switch to yapf | Martin Czygan | 2020-08-12 | 6 | -10/+28 | |
| | ||||||
* | add tests | Martin Czygan | 2020-08-12 | 1 | -0/+115 | |
| | ||||||
* | utils: fix imports | Martin Czygan | 2020-08-12 | 1 | -1/+1 | |
| | ||||||
* | fix status definition | Martin Czygan | 2020-08-12 | 2 | -1/+3 | |
| | ||||||
* | add pytest dev dependency | Martin Czygan | 2020-08-12 | 1 | -1/+1 | |
| | ||||||
* | import utility functions | Martin Czygan | 2020-08-12 | 3 | -0/+165 | |
| | ||||||
* | apply formatting style | Martin Czygan | 2020-08-12 | 1 | -1/+1 | |
| | ||||||
* | add basic str utils | Martin Czygan | 2020-08-12 | 2 | -0/+83 | |
| | ||||||
* | add makefile style target | Martin Czygan | 2020-08-12 | 2 | -3/+5 | |
| | ||||||
* | cleanup build directory as well | Martin Czygan | 2020-08-12 | 1 | -0/+1 | |
| | ||||||
* | v0.1.1 | Martin Czygan | 2020-08-12 | 1 | -0/+1 | |
| | ||||||
* | specify version in one place only | Martin Czygan | 2020-08-12 | 2 | -2/+5 | |
| | | | | use: fuzzycat/__init__.py | |||||
* | let make deps pipenv install use pre releases | Martin Czygan | 2020-08-12 | 3 | -3/+359 | |
| | | | | | The problem appeared as black seems to be a pre-release, cf. https://github.com/microsoft/vscode-python/issues/5171. | |||||
* | allow pypi uploads | Martin Czygan | 2020-08-12 | 2 | -3/+19 | |
| | | | | see: https://pypi.org/project/fuzzycat/ | |||||
* | basic scaffolding | Martin Czygan | 2020-08-12 | 8 | -1/+116 | |
| | ||||||
* | Initial commit | Martin Czygan | 2020-08-12 | 3 | -0/+152 | |