aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/doaj_article.py
Commit message (Collapse)AuthorAgeFilesLines
* refactor importer metadata tables into separate file; move some helpers aroundBryan Newbold2021-11-101-4/+1
| | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work)
* typing: relatively simple type check fixesBryan Newbold2021-11-031-8/+6
| | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes.
* typing: initial annotations on importersBryan Newbold2021-11-031-10/+11
| | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass.
* lint: resolve existing mypy type errorsBryan Newbold2021-11-021-1/+1
| | | | | | | | | Adds annotations and re-workes dataflow to satisfy existing mypy issues, without adding any additional type annotations to, eg, function signatures. There will probably be many more type errors when annotations are all added.
* fmt (black): fatcat_tools/Bryan Newbold2021-11-021-82/+96
|
* python: isort everythingBryan Newbold2021-11-021-4/+13
|
* lint: simple, safe inline lint fixesBryan Newbold2021-11-021-1/+1
| | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc
* add fuzzy match filtering to DOAJ importerBryan Newbold2020-12-161-2/+9
| | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity.
* doaj: fix update code path (getattr not __dict__)Bryan Newbold2020-11-201-4/+3
| | | | Also add missing code coverage for update path (disabled by default).
* DOAJ: handle empty identifier 'id' caseBryan Newbold2020-11-201-0/+2
|
* tweak DOAJ importer class args and default for do_updatesBryan Newbold2020-11-191-2/+2
|
* implement remainder of DOAJ article importerBryan Newbold2020-11-191-57/+125
|
* initial implementation of DOAJ importerBryan Newbold2020-11-191-0/+289
Several things to finish implementing and polish.