summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/normal.py
Commit message (Expand)AuthorAgeFilesLines
* python: normalization/validation support for handle identifiers (hdl)Bryan Newbold2021-10-131-0/+33
* clean_doi() should lower-case returned DOIBryan Newbold2021-06-071-1/+4
* normalizer: test for un-versioned arxiv_idBryan Newbold2020-12-241-0/+4
* wikidata QID normalize helperBryan Newbold2020-12-171-2/+24
* HACK: squash intermitent failure of detect_text_lang() testBryan Newbold2020-12-111-1/+2
* langdetect: more text for 'zh' test caseBryan Newbold2020-11-201-1/+1
* clean DOI: ban all non-ASCII charactersBryan Newbold2020-11-191-1/+4
* normal: handle langdetect of 'zh-cn' (not len=2)Bryan Newbold2020-11-191-0/+3
* handle more non-ASCII DOI casesBryan Newbold2020-11-191-1/+3
* more python normalizers, and move from importer commonBryan Newbold2020-11-191-0/+322
* normalizer: filter out a specific non-ASCII character in DOIBryan Newbold2020-11-041-1/+3
* lint (flake8) tool python filesBryan Newbold2020-07-011-1/+0
* disallow a specific unicode character from DOIsBryan Newbold2020-06-261-0/+6
* consistently use raw string prefix for regexBryan Newbold2020-04-171-5/+5
* normal: DOI corner-case from pubmed importBryan Newbold2020-01-191-0/+9
* do not normalize "en dash" in DOIMartin Czygan2020-01-171-2/+5
* doi parsing fixesBryan Newbold2019-12-231-0/+7
* normalizers: clean_pmid(), and handle nulls in all other cleanersBryan Newbold2019-12-231-0/+31
* handle more external identifiers in pythonBryan Newbold2019-09-181-14/+97
* start work on 'generic' search boxBryan Newbold2019-06-131-0/+95