aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/crossref.py
Commit message (Collapse)AuthorAgeFilesLines
* more involved type wrangling and fixes for importersBryan Newbold2021-11-031-5/+6
|
* typing: relatively simple type check fixesBryan Newbold2021-11-031-3/+1
| | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes.
* typing: initial annotations on importersBryan Newbold2021-11-031-13/+13
| | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass.
* lint: resolve existing mypy type errorsBryan Newbold2021-11-021-15/+11
| | | | | | | | | Adds annotations and re-workes dataflow to satisfy existing mypy issues, without adding any additional type annotations to, eg, function signatures. There will probably be many more type errors when annotations are all added.
* fmt (black): fatcat_tools/Bryan Newbold2021-11-021-167/+246
|
* python: isort everythingBryan Newbold2021-11-021-3/+2
|
* lint: simple, safe inline lint fixesBryan Newbold2021-11-021-2/+1
| | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc
* small python tweaks for annotations, importsBryan Newbold2021-11-021-1/+5
|
* try some type annotationsBryan Newbold2021-11-021-22/+29
|
* crossref+datacite: remove confusing early update bailBryan Newbold2020-11-201-2/+0
| | | | | Easy to miss that we skip updates *twice*, and with this early bailout were not updating counts correctly.
* simple lint (flake8) fixes over python codebaseBryan Newbold2020-07-231-7/+7
| | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements.
* lint (flake8) tool python filesBryan Newbold2020-07-011-7/+1
|
* add new license mappingsBryan Newbold2020-06-301-0/+13
|
* Merge pull request #53 from EdwardBetts/spellingbnewbold2020-03-271-2/+2
|\ | | | | Correct spelling mistakes
| * Correct spelling mistakesEdward Betts2020-03-271-2/+2
| |
* | crossref: skip stub OUP titleBryan Newbold2020-03-191-0/+8
|/ | | | | | It seems like OUP pre-registers DOIs with this place-holder title, then updates the Crossref metdata when the paper is actually published. We should wait until the real title is available before creating an entity.
* crossref: accurate blank title countsBryan Newbold2019-11-051-0/+1
|
* crossref: component typeBryan Newbold2019-11-041-1/+3
|
* crossref: count why skip happenedBryan Newbold2019-11-041-1/+7
| | | | | | Might skip based on release type (eg container, not a paper/release), or missing title, or other reasons. Over 7 million DOIs are getting skipped, curious why.
* crossref: don't skip on short/null subtitleBryan Newbold2019-11-041-1/+1
| | | | This was a bug. Should only set subtitle black, not skip the import.
* refactor all python source for client lib nameBryan Newbold2019-09-051-10/+10
|
* crossref: allow 'name' fallback (for groups, etc)Bryan Newbold2019-06-241-1/+1
|
* better crossref container_name handlingBryan Newbold2019-05-241-7/+12
|
* arxiv license slug shorter; fix testBryan Newbold2019-05-221-1/+1
|
* importers: create containers by defaultBryan Newbold2019-05-211-1/+2
|
* arxiv license/slug mapBryan Newbold2019-05-211-0/+1
|
* python implBryan Newbold2019-05-141-4/+5
|
* python implBryan Newbold2019-05-141-2/+2
|
* importer code updatesBryan Newbold2019-05-131-2/+14
|
* partial python impl of ext_id and release_stage refactorsBryan Newbold2019-05-131-12/+14
|
* better/additional crossref license lookupsBryan Newbold2019-02-141-20/+58
|
* crossref: import subtitle as str, not list[str]Bryan Newbold2019-02-141-0/+2
|
* add some missing LICENSE_SLUG_MAPBryan Newbold2019-02-051-1/+4
|
* crossref import tweaks/fixesBryan Newbold2019-01-291-7/+9
| | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code)
* fix bug in clean() resulting in many consistency check failsBryan Newbold2019-01-291-10/+9
|
* fix refs extra ordering bugBryan Newbold2019-01-291-6/+6
|
* pass through kwargs (fixes bezerk imports)Bryan Newbold2019-01-291-1/+2
|
* ensure raw_name is not stubBryan Newbold2019-01-291-1/+4
|
* ensure abstracts aren't stubsBryan Newbold2019-01-291-2/+3
|
* fix title length checks in crossrefBryan Newbold2019-01-281-2/+2
|
* filter short/stub original_titleBryan Newbold2019-01-281-3/+7
|
* enforce title len>1 for release importsBryan Newbold2019-01-281-0/+3
|
* tweak crossref import, and update testsBryan Newbold2019-01-241-11/+27
|
* allow importing contrib/refs listsBryan Newbold2019-01-241-5/+13
| | | | | | The motivation here isn't really to support these gigantic lists on principle, but to be able to ingest large corpuses without having to decide whether to filter out or crop such lists.
* importer bugfixesBryan Newbold2019-01-231-3/+3
|
* bunch of crossref import tweaks (need tests)Bryan Newbold2019-01-231-50/+43
|
* ftfy all over (needs Pipfile.lock)Bryan Newbold2019-01-231-19/+22
|
* refactor remaining importersBryan Newbold2019-01-221-4/+7
|
* refactored crossref importer to new styleBryan Newbold2019-01-221-69/+56
|
* crossref importer updatesBryan Newbold2019-01-221-19/+78
|