Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | update notes | Martin Czygan | 2020-11-21 | 2 | -15/+16 | |
| | ||||||
* | wip: handle empty lists | Martin Czygan | 2020-11-21 | 1 | -6/+9 | |
| | ||||||
* | wip: datacite, figshare versions | Martin Czygan | 2020-11-21 | 1 | -6/+35 | |
| | ||||||
* | wip: another contrib comparison | Martin Czygan | 2020-11-20 | 1 | -14/+92 | |
| | ||||||
* | cleanup list | Martin Czygan | 2020-11-20 | 1 | -1/+0 | |
| | ||||||
* | update notes | Martin Czygan | 2020-11-20 | 1 | -15/+26 | |
| | ||||||
* | update notes | Martin Czygan | 2020-11-20 | 1 | -0/+111 | |
| | ||||||
* | verify: ignore certain types of release types for now | Martin Czygan | 2020-11-19 | 1 | -2/+4 | |
| | ||||||
* | update notes | Martin Czygan | 2020-11-19 | 1 | -1/+5 | |
| | ||||||
* | update stats | Martin Czygan | 2020-11-19 | 2 | -25/+30 | |
| | ||||||
* | verify: ignore ids like solv-int/9606010v1 for now | Martin Czygan | 2020-11-19 | 1 | -4/+8 | |
| | ||||||
* | verify: allow a larger gap | Martin Czygan | 2020-11-19 | 1 | -1/+6 | |
| | ||||||
* | verify: account for article/article-journal | Martin Czygan | 2020-11-19 | 1 | -1/+4 | |
| | ||||||
* | update verification case list | Martin Czygan | 2020-11-19 | 2 | -9/+20 | |
| | ||||||
* | update notes | Martin Czygan | 2020-11-19 | 2 | -12/+29 | |
| | ||||||
* | update notes | Martin Czygan | 2020-11-19 | 1 | -34/+43 | |
| | ||||||
* | ignore sample files | Martin Czygan | 2020-11-19 | 1 | -0/+3 | |
| | ||||||
* | update README | Martin Czygan | 2020-11-18 | 1 | -0/+58 | |
| | ||||||
* | verify: fix a None | Martin Czygan | 2020-11-18 | 1 | -2/+2 | |
| | ||||||
* | cluster: log progress | Martin Czygan | 2020-11-17 | 1 | -1/+3 | |
| | ||||||
* | cleanup sql stuff for now | Martin Czygan | 2020-11-17 | 1 | -13/+0 | |
| | ||||||
* | move blacklist to the end | Martin Czygan | 2020-11-17 | 1 | -227/+666 | |
| | ||||||
* | cleanup blacklist | Martin Czygan | 2020-11-17 | 1 | -1524/+1531 | |
| | ||||||
* | update stats | Martin Czygan | 2020-11-17 | 1 | -245/+1561 | |
| | ||||||
* | fix subtitle check | Martin Czygan | 2020-11-17 | 1 | -2/+11 | |
| | ||||||
* | extend title blacklist | Martin Czygan | 2020-11-17 | 1 | -34/+1293 | |
| | ||||||
* | update stats | Martin Czygan | 2020-11-17 | 1 | -9/+9 | |
| | ||||||
* | update blacklist | Martin Czygan | 2020-11-17 | 1 | -8/+65 | |
| | ||||||
* | update blacklist | Martin Czygan | 2020-11-17 | 1 | -4/+16 | |
| | ||||||
* | update stats | Martin Czygan | 2020-11-17 | 1 | -5/+7 | |
| | ||||||
* | update blacklist | Martin Czygan | 2020-11-17 | 1 | -12/+15 | |
| | ||||||
* | update notes | Martin Czygan | 2020-11-17 | 1 | -14/+52 | |
| | ||||||
* | update docs and blacklist | Martin Czygan | 2020-11-17 | 1 | -0/+28 | |
| | ||||||
* | update blacklists | Martin Czygan | 2020-11-17 | 1 | -2/+22 | |
| | ||||||
* | be less fine grained with datasets | Martin Czygan | 2020-11-17 | 1 | -1/+11 | |
| | ||||||
* | handle newline in titles | Martin Czygan | 2020-11-17 | 1 | -14/+10 | |
| | ||||||
* | update blacklist | Martin Czygan | 2020-11-17 | 1 | -1/+1 | |
| | ||||||
* | update blacklist | Martin Czygan | 2020-11-16 | 1 | -8/+39 | |
| | ||||||
* | add more blacklists | Martin Czygan | 2020-11-16 | 1 | -15/+32 | |
| | ||||||
* | wip: author_slug | Martin Czygan | 2020-11-15 | 1 | -2/+26 | |
| | ||||||
* | update title blacklist | Martin Czygan | 2020-11-14 | 1 | -0/+1 | |
| | ||||||
* | wip: verification and tests | Martin Czygan | 2020-11-14 | 3 | -48/+236 | |
| | ||||||
* | update Pipfile | Martin Czygan | 2020-11-14 | 2 | -50/+69 | |
| | ||||||
* | fix tests | Martin Czygan | 2020-11-13 | 4 | -55/+4 | |
| | ||||||
* | wip: verification | Martin Czygan | 2020-11-13 | 3 | -17/+181 | |
| | | | | | | | | | | | | | Output currently (1m sample): { "unique": 916075, "too_large": 575, "dummy": 10307, "contrib_miss": 27215, "short_title": 1379, "arxiv_v": 8943 } | |||||
* | Merge branch 'bnewbold-sandcrawler' of https://github.com/bnewbold/fuzzycat ↵ | Martin Czygan | 2020-11-12 | 6 | -54/+761 | |
|\ | | | | | | | | | | | | | | | | | | | | | | | | | into bnewbold-bnewbold-sandcrawler * 'bnewbold-sandcrawler' of https://github.com/bnewbold/fuzzycat: sandcrawler slugify: yet more unicode corner-cases add sandcrawler-style title key method cluster: count empty keys (and don't return them) pipenv: explicit regex dependency gitignore: add .swp (vim) make: run pytest over fuzzycat/ to catch inline tests add support for key denylist | |||||
| * | sandcrawler slugify: yet more unicode corner-cases | Bryan Newbold | 2020-11-10 | 1 | -16/+47 | |
| | | ||||||
| * | add sandcrawler-style title key method | Bryan Newbold | 2020-11-10 | 2 | -3/+132 | |
| | | ||||||
| * | cluster: count empty keys (and don't return them) | Bryan Newbold | 2020-11-10 | 1 | -0/+3 | |
| | | ||||||
| * | pipenv: explicit regex dependency | Bryan Newbold | 2020-11-10 | 1 | -0/+1 | |
| | | | | | | | | | | | | | | | | regex, unlike stdlib 're' module, has unicode support. I couldn't get pipenv to lock after adding this dependency, even though Pipfile.lock already includes regex as a sub-dependency of something else. |