Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | refactor importer metadata tables into separate file; move some helpers around | Bryan Newbold | 2021-11-10 | 1 | -92/+2 |
| | | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work) | ||||
* | importers: refactor imports of clean() and other normalization helpers | Bryan Newbold | 2021-11-10 | 1 | -28/+28 |
| | |||||
* | importers: use clean_doi() in many more (all?) importers | Bryan Newbold | 2021-11-09 | 1 | -1/+8 |
| | |||||
* | remove deprecated extid sqlite3 lookup table feature from importers | Bryan Newbold | 2021-11-09 | 1 | -54/+0 |
| | | | | | | | | This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall. | ||||
* | more involved type wrangling and fixes for importers | Bryan Newbold | 2021-11-03 | 1 | -5/+6 |
| | |||||
* | typing: relatively simple type check fixes | Bryan Newbold | 2021-11-03 | 1 | -3/+1 |
| | | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes. | ||||
* | typing: initial annotations on importers | Bryan Newbold | 2021-11-03 | 1 | -13/+13 |
| | | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass. | ||||
* | lint: resolve existing mypy type errors | Bryan Newbold | 2021-11-02 | 1 | -15/+11 |
| | | | | | | | | | Adds annotations and re-workes dataflow to satisfy existing mypy issues, without adding any additional type annotations to, eg, function signatures. There will probably be many more type errors when annotations are all added. | ||||
* | fmt (black): fatcat_tools/ | Bryan Newbold | 2021-11-02 | 1 | -167/+246 |
| | |||||
* | python: isort everything | Bryan Newbold | 2021-11-02 | 1 | -3/+2 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 1 | -2/+1 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | small python tweaks for annotations, imports | Bryan Newbold | 2021-11-02 | 1 | -1/+5 |
| | |||||
* | try some type annotations | Bryan Newbold | 2021-11-02 | 1 | -22/+29 |
| | |||||
* | crossref+datacite: remove confusing early update bail | Bryan Newbold | 2020-11-20 | 1 | -2/+0 |
| | | | | | Easy to miss that we skip updates *twice*, and with this early bailout were not updating counts correctly. | ||||
* | simple lint (flake8) fixes over python codebase | Bryan Newbold | 2020-07-23 | 1 | -7/+7 |
| | | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements. | ||||
* | lint (flake8) tool python files | Bryan Newbold | 2020-07-01 | 1 | -7/+1 |
| | |||||
* | add new license mappings | Bryan Newbold | 2020-06-30 | 1 | -0/+13 |
| | |||||
* | Merge pull request #53 from EdwardBetts/spelling | bnewbold | 2020-03-27 | 1 | -2/+2 |
|\ | | | | | Correct spelling mistakes | ||||
| * | Correct spelling mistakes | Edward Betts | 2020-03-27 | 1 | -2/+2 |
| | | |||||
* | | crossref: skip stub OUP title | Bryan Newbold | 2020-03-19 | 1 | -0/+8 |
|/ | | | | | | It seems like OUP pre-registers DOIs with this place-holder title, then updates the Crossref metdata when the paper is actually published. We should wait until the real title is available before creating an entity. | ||||
* | crossref: accurate blank title counts | Bryan Newbold | 2019-11-05 | 1 | -0/+1 |
| | |||||
* | crossref: component type | Bryan Newbold | 2019-11-04 | 1 | -1/+3 |
| | |||||
* | crossref: count why skip happened | Bryan Newbold | 2019-11-04 | 1 | -1/+7 |
| | | | | | | Might skip based on release type (eg container, not a paper/release), or missing title, or other reasons. Over 7 million DOIs are getting skipped, curious why. | ||||
* | crossref: don't skip on short/null subtitle | Bryan Newbold | 2019-11-04 | 1 | -1/+1 |
| | | | | This was a bug. Should only set subtitle black, not skip the import. | ||||
* | refactor all python source for client lib name | Bryan Newbold | 2019-09-05 | 1 | -10/+10 |
| | |||||
* | crossref: allow 'name' fallback (for groups, etc) | Bryan Newbold | 2019-06-24 | 1 | -1/+1 |
| | |||||
* | better crossref container_name handling | Bryan Newbold | 2019-05-24 | 1 | -7/+12 |
| | |||||
* | arxiv license slug shorter; fix test | Bryan Newbold | 2019-05-22 | 1 | -1/+1 |
| | |||||
* | importers: create containers by default | Bryan Newbold | 2019-05-21 | 1 | -1/+2 |
| | |||||
* | arxiv license/slug map | Bryan Newbold | 2019-05-21 | 1 | -0/+1 |
| | |||||
* | python impl | Bryan Newbold | 2019-05-14 | 1 | -4/+5 |
| | |||||
* | python impl | Bryan Newbold | 2019-05-14 | 1 | -2/+2 |
| | |||||
* | importer code updates | Bryan Newbold | 2019-05-13 | 1 | -2/+14 |
| | |||||
* | partial python impl of ext_id and release_stage refactors | Bryan Newbold | 2019-05-13 | 1 | -12/+14 |
| | |||||
* | better/additional crossref license lookups | Bryan Newbold | 2019-02-14 | 1 | -20/+58 |
| | |||||
* | crossref: import subtitle as str, not list[str] | Bryan Newbold | 2019-02-14 | 1 | -0/+2 |
| | |||||
* | add some missing LICENSE_SLUG_MAP | Bryan Newbold | 2019-02-05 | 1 | -1/+4 |
| | |||||
* | crossref import tweaks/fixes | Bryan Newbold | 2019-01-29 | 1 | -7/+9 |
| | | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code) | ||||
* | fix bug in clean() resulting in many consistency check fails | Bryan Newbold | 2019-01-29 | 1 | -10/+9 |
| | |||||
* | fix refs extra ordering bug | Bryan Newbold | 2019-01-29 | 1 | -6/+6 |
| | |||||
* | pass through kwargs (fixes bezerk imports) | Bryan Newbold | 2019-01-29 | 1 | -1/+2 |
| | |||||
* | ensure raw_name is not stub | Bryan Newbold | 2019-01-29 | 1 | -1/+4 |
| | |||||
* | ensure abstracts aren't stubs | Bryan Newbold | 2019-01-29 | 1 | -2/+3 |
| | |||||
* | fix title length checks in crossref | Bryan Newbold | 2019-01-28 | 1 | -2/+2 |
| | |||||
* | filter short/stub original_title | Bryan Newbold | 2019-01-28 | 1 | -3/+7 |
| | |||||
* | enforce title len>1 for release imports | Bryan Newbold | 2019-01-28 | 1 | -0/+3 |
| | |||||
* | tweak crossref import, and update tests | Bryan Newbold | 2019-01-24 | 1 | -11/+27 |
| | |||||
* | allow importing contrib/refs lists | Bryan Newbold | 2019-01-24 | 1 | -5/+13 |
| | | | | | | The motivation here isn't really to support these gigantic lists on principle, but to be able to ingest large corpuses without having to decide whether to filter out or crop such lists. | ||||
* | importer bugfixes | Bryan Newbold | 2019-01-23 | 1 | -3/+3 |
| | |||||
* | bunch of crossref import tweaks (need tests) | Bryan Newbold | 2019-01-23 | 1 | -50/+43 |
| |