summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/normal.py
Commit message (Collapse)AuthorAgeFilesLines
* normalizer: filter out a specific non-ASCII character in DOIBryan Newbold2020-11-041-1/+3
|
* lint (flake8) tool python filesBryan Newbold2020-07-011-1/+0
|
* disallow a specific unicode character from DOIsBryan Newbold2020-06-261-0/+6
|
* consistently use raw string prefix for regexBryan Newbold2020-04-171-5/+5
|
* normal: DOI corner-case from pubmed importBryan Newbold2020-01-191-0/+9
|
* do not normalize "en dash" in DOIMartin Czygan2020-01-171-2/+5
| | | | | | | | | Technically, [...] DOI names may incorporate any printable characters from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1). For mostly QA reasons, we currently treat a DOI with an "en dash" as invalid.
* doi parsing fixesBryan Newbold2019-12-231-0/+7
| | | | | | | | | | Replace emdash with regular dash. Replace double slash after partner ID with single slash. This conversion seems to be done by crossref automatically on lookup. I tried several examples, using doi.org resolver and Crossref API lookup. Note that there are a number of fatcat entities with '//' in the DOI.
* normalizers: clean_pmid(), and handle nulls in all other cleanersBryan Newbold2019-12-231-0/+31
|
* handle more external identifiers in pythonBryan Newbold2019-09-181-14/+97
| | | | | This makes it possible to, eg, past an arxiv identifier or SHA-1 hash in the general search box and do a quick lookup.
* start work on 'generic' search boxBryan Newbold2019-06-131-0/+95