aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* update notesMartin Czygan2020-11-212-15/+16
|
* wip: handle empty listsMartin Czygan2020-11-211-6/+9
|
* wip: datacite, figshare versionsMartin Czygan2020-11-211-6/+35
|
* wip: another contrib comparisonMartin Czygan2020-11-201-14/+92
|
* cleanup listMartin Czygan2020-11-201-1/+0
|
* update notesMartin Czygan2020-11-201-15/+26
|
* update notesMartin Czygan2020-11-201-0/+111
|
* verify: ignore certain types of release types for nowMartin Czygan2020-11-191-2/+4
|
* update notesMartin Czygan2020-11-191-1/+5
|
* update statsMartin Czygan2020-11-192-25/+30
|
* verify: ignore ids like solv-int/9606010v1 for nowMartin Czygan2020-11-191-4/+8
|
* verify: allow a larger gapMartin Czygan2020-11-191-1/+6
|
* verify: account for article/article-journalMartin Czygan2020-11-191-1/+4
|
* update verification case listMartin Czygan2020-11-192-9/+20
|
* update notesMartin Czygan2020-11-192-12/+29
|
* update notesMartin Czygan2020-11-191-34/+43
|
* ignore sample filesMartin Czygan2020-11-191-0/+3
|
* update READMEMartin Czygan2020-11-181-0/+58
|
* verify: fix a NoneMartin Czygan2020-11-181-2/+2
|
* cluster: log progressMartin Czygan2020-11-171-1/+3
|
* cleanup sql stuff for nowMartin Czygan2020-11-171-13/+0
|
* move blacklist to the endMartin Czygan2020-11-171-227/+666
|
* cleanup blacklistMartin Czygan2020-11-171-1524/+1531
|
* update statsMartin Czygan2020-11-171-245/+1561
|
* fix subtitle checkMartin Czygan2020-11-171-2/+11
|
* extend title blacklistMartin Czygan2020-11-171-34/+1293
|
* update statsMartin Czygan2020-11-171-9/+9
|
* update blacklistMartin Czygan2020-11-171-8/+65
|
* update blacklistMartin Czygan2020-11-171-4/+16
|
* update statsMartin Czygan2020-11-171-5/+7
|
* update blacklistMartin Czygan2020-11-171-12/+15
|
* update notesMartin Czygan2020-11-171-14/+52
|
* update docs and blacklistMartin Czygan2020-11-171-0/+28
|
* update blacklistsMartin Czygan2020-11-171-2/+22
|
* be less fine grained with datasetsMartin Czygan2020-11-171-1/+11
|
* handle newline in titlesMartin Czygan2020-11-171-14/+10
|
* update blacklistMartin Czygan2020-11-171-1/+1
|
* update blacklistMartin Czygan2020-11-161-8/+39
|
* add more blacklistsMartin Czygan2020-11-161-15/+32
|
* wip: author_slugMartin Czygan2020-11-151-2/+26
|
* update title blacklistMartin Czygan2020-11-141-0/+1
|
* wip: verification and testsMartin Czygan2020-11-143-48/+236
|
* update PipfileMartin Czygan2020-11-142-50/+69
|
* fix testsMartin Czygan2020-11-134-55/+4
|
* wip: verificationMartin Czygan2020-11-133-17/+181
| | | | | | | | | | | | | Output currently (1m sample): { "unique": 916075, "too_large": 575, "dummy": 10307, "contrib_miss": 27215, "short_title": 1379, "arxiv_v": 8943 }
* Merge branch 'bnewbold-sandcrawler' of https://github.com/bnewbold/fuzzycat ↵Martin Czygan2020-11-126-54/+761
|\ | | | | | | | | | | | | | | | | | | | | | | | | into bnewbold-bnewbold-sandcrawler * 'bnewbold-sandcrawler' of https://github.com/bnewbold/fuzzycat: sandcrawler slugify: yet more unicode corner-cases add sandcrawler-style title key method cluster: count empty keys (and don't return them) pipenv: explicit regex dependency gitignore: add .swp (vim) make: run pytest over fuzzycat/ to catch inline tests add support for key denylist
| * sandcrawler slugify: yet more unicode corner-casesBryan Newbold2020-11-101-16/+47
| |
| * add sandcrawler-style title key methodBryan Newbold2020-11-102-3/+132
| |
| * cluster: count empty keys (and don't return them)Bryan Newbold2020-11-101-0/+3
| |
| * pipenv: explicit regex dependencyBryan Newbold2020-11-101-0/+1
| | | | | | | | | | | | | | | | regex, unlike stdlib 're' module, has unicode support. I couldn't get pipenv to lock after adding this dependency, even though Pipfile.lock already includes regex as a sub-dependency of something else.