Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | importers: refactor imports of clean() and other normalization helpers | Bryan Newbold | 2021-11-10 | 1 | -11/+11 |
| | |||||
* | remove deprecated extid sqlite3 lookup table feature from importers | Bryan Newbold | 2021-11-09 | 1 | -52/+0 |
| | | | | | | | | This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall. | ||||
* | typing: relatively simple type check fixes | Bryan Newbold | 2021-11-03 | 1 | -4/+10 |
| | | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes. | ||||
* | typing: initial annotations on importers | Bryan Newbold | 2021-11-03 | 1 | -14/+18 |
| | | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass. | ||||
* | importers: remove unused __main__ routine | Bryan Newbold | 2021-11-03 | 1 | -4/+0 |
| | | | | | | These perhaps were used in initial develoment or testing? fatcat_import.py is the correct way to do these imports, even for testing/development. | ||||
* | fmt (black): fatcat_tools/ | Bryan Newbold | 2021-11-02 | 1 | -81/+112 |
| | |||||
* | python: isort everything | Bryan Newbold | 2021-11-02 | 1 | -4/+6 |
| | |||||
* | more consistent and defensive lower-casing of DOIs | Bryan Newbold | 2021-06-23 | 1 | -1/+2 |
| | | | | | | | After noticing more upper/lower ambiguity in production. In particular, we have some old ingest requests in sandcrawler DB, which get re-submitted/re-tried, which have capitalized DOIs in the link source id field. | ||||
* | simple lint (flake8) fixes over python codebase | Bryan Newbold | 2020-07-23 | 1 | -1/+1 |
| | | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements. | ||||
* | lint (flake8) tool python files | Bryan Newbold | 2020-07-01 | 1 | -3/+0 |
| | |||||
* | Indentity is not the same this as equality in Python | Christian Clauss | 2020-05-14 | 1 | -2/+2 |
| | |||||
* | importers: replace newlines in get_text() strings | Bryan Newbold | 2020-04-01 | 1 | -7/+7 |
| | |||||
* | importers: more string/get_text swaps | Bryan Newbold | 2020-03-28 | 1 | -7/+7 |
| | | | | See previous pubmed commit for details. | ||||
* | jalc: avoid meaningless pages values | Bryan Newbold | 2020-03-23 | 1 | -4/+8 |
| | |||||
* | refactor all python source for client lib name | Bryan Newbold | 2019-09-05 | 1 | -8/+8 |
| | |||||
* | JALC: handle empty publisher string | Bryan Newbold | 2019-05-30 | 1 | -3/+4 |
| | |||||
* | remove stray JALC debug code | Bryan Newbold | 2019-05-29 | 1 | -2/+3 |
| | |||||
* | improve JALC author handling | Bryan Newbold | 2019-05-29 | 1 | -59/+85 |
| | |||||
* | all new importers need to set contrib index (order) | Bryan Newbold | 2019-05-22 | 1 | -1/+5 |
| | |||||
* | jalc empty publisher string | Bryan Newbold | 2019-05-22 | 1 | -2/+2 |
| | |||||
* | better JALC and arxiv DOI checks | Bryan Newbold | 2019-05-22 | 1 | -1/+1 |
| | |||||
* | yet another JALC edge-case | Bryan Newbold | 2019-05-21 | 1 | -1/+1 |
| | |||||
* | better JALC DOI de-mangling | Bryan Newbold | 2019-05-21 | 1 | -1/+10 |
| | |||||
* | JALC importer requires a valid DOI | Bryan Newbold | 2019-05-21 | 1 | -0/+1 |
| | |||||
* | handle bad JALC DOIs | Bryan Newbold | 2019-05-21 | 1 | -1/+3 |
| | |||||
* | JALC more robust to partial names | Bryan Newbold | 2019-05-21 | 1 | -8/+19 |
| | |||||
* | more JALC importer tweaks | Bryan Newbold | 2019-05-21 | 1 | -7/+10 |
| | |||||
* | JALC importer: handle missing titles | Bryan Newbold | 2019-05-21 | 1 | -0/+2 |
| | |||||
* | importers: create containers by default | Bryan Newbold | 2019-05-21 | 1 | -1/+3 |
| | |||||
* | more JALC importer polish | Bryan Newbold | 2019-05-21 | 1 | -4/+17 |
| | |||||
* | tweaks to new imports/tests | Bryan Newbold | 2019-05-21 | 1 | -1/+1 |
| | |||||
* | clean up JALC importer a tiny bit | Bryan Newbold | 2019-05-21 | 1 | -8/+3 |
| | |||||
* | initial flesh out of JALC parser | Bryan Newbold | 2019-05-21 | 1 | -0/+310 |