summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/jalc.py
Commit message (Collapse)AuthorAgeFilesLines
* importers: refactor imports of clean() and other normalization helpersBryan Newbold2021-11-101-11/+11
|
* remove deprecated extid sqlite3 lookup table feature from importersBryan Newbold2021-11-091-52/+0
| | | | | | | | This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall.
* typing: relatively simple type check fixesBryan Newbold2021-11-031-4/+10
| | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes.
* typing: initial annotations on importersBryan Newbold2021-11-031-14/+18
| | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass.
* importers: remove unused __main__ routineBryan Newbold2021-11-031-4/+0
| | | | | | These perhaps were used in initial develoment or testing? fatcat_import.py is the correct way to do these imports, even for testing/development.
* fmt (black): fatcat_tools/Bryan Newbold2021-11-021-81/+112
|
* python: isort everythingBryan Newbold2021-11-021-4/+6
|
* more consistent and defensive lower-casing of DOIsBryan Newbold2021-06-231-1/+2
| | | | | | | After noticing more upper/lower ambiguity in production. In particular, we have some old ingest requests in sandcrawler DB, which get re-submitted/re-tried, which have capitalized DOIs in the link source id field.
* simple lint (flake8) fixes over python codebaseBryan Newbold2020-07-231-1/+1
| | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements.
* lint (flake8) tool python filesBryan Newbold2020-07-011-3/+0
|
* Indentity is not the same this as equality in PythonChristian Clauss2020-05-141-2/+2
|
* importers: replace newlines in get_text() stringsBryan Newbold2020-04-011-7/+7
|
* importers: more string/get_text swapsBryan Newbold2020-03-281-7/+7
| | | | See previous pubmed commit for details.
* jalc: avoid meaningless pages valuesBryan Newbold2020-03-231-4/+8
|
* refactor all python source for client lib nameBryan Newbold2019-09-051-8/+8
|
* JALC: handle empty publisher stringBryan Newbold2019-05-301-3/+4
|
* remove stray JALC debug codeBryan Newbold2019-05-291-2/+3
|
* improve JALC author handlingBryan Newbold2019-05-291-59/+85
|
* all new importers need to set contrib index (order)Bryan Newbold2019-05-221-1/+5
|
* jalc empty publisher stringBryan Newbold2019-05-221-2/+2
|
* better JALC and arxiv DOI checksBryan Newbold2019-05-221-1/+1
|
* yet another JALC edge-caseBryan Newbold2019-05-211-1/+1
|
* better JALC DOI de-manglingBryan Newbold2019-05-211-1/+10
|
* JALC importer requires a valid DOIBryan Newbold2019-05-211-0/+1
|
* handle bad JALC DOIsBryan Newbold2019-05-211-1/+3
|
* JALC more robust to partial namesBryan Newbold2019-05-211-8/+19
|
* more JALC importer tweaksBryan Newbold2019-05-211-7/+10
|
* JALC importer: handle missing titlesBryan Newbold2019-05-211-0/+2
|
* importers: create containers by defaultBryan Newbold2019-05-211-1/+3
|
* more JALC importer polishBryan Newbold2019-05-211-4/+17
|
* tweaks to new imports/testsBryan Newbold2019-05-211-1/+1
|
* clean up JALC importer a tiny bitBryan Newbold2019-05-211-8/+3
|
* initial flesh out of JALC parserBryan Newbold2019-05-211-0/+310