Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pubmed: allow updates if PMCID does not exist yet | Bryan Newbold | 2021-11-10 | 1 | -1/+6 |
| | | | | | | | | | | | The intent of this change is to start updating Pubmed metadata records when a PMCID has been assigned, but that ext_id hasn't been recorded in fatcat yet. It is likely that this change will result in some additional duplicate PMCIDs in the catalog. But the principle is that the PMID is the primary pubmed identifier, and all records with a PMID should have the PMCID that pubmed indicates, even if there exists another incorrect record. | ||||
* | typing: relatively simple type check fixes | Bryan Newbold | 2021-11-03 | 1 | -17/+7 |
| | | | | | | | These mostly add new variable names so that existing variables aren't overwritten with a new type; delay coercing '{}' or '[]' to 'None' until the last minute; adding is-not-None checks to conditional clauses; and similar small changes. | ||||
* | typing: initial annotations on importers | Bryan Newbold | 2021-11-03 | 1 | -10/+17 |
| | | | | | This commit just adds the type annotations, doesn't do fixes to code to make type checking pass. | ||||
* | importers: remove unused __main__ routine | Bryan Newbold | 2021-11-03 | 1 | -5/+0 |
| | | | | | | These perhaps were used in initial develoment or testing? fatcat_import.py is the correct way to do these imports, even for testing/development. | ||||
* | fmt (black): fatcat_tools/ | Bryan Newbold | 2021-11-02 | 1 | -158/+197 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 1 | -1/+1 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | lint/fmt: remove all 'import *' | Bryan Newbold | 2021-11-02 | 1 | -5/+7 |
| | |||||
* | python: partial importer utilization of new schema changes | Bryan Newbold | 2021-10-13 | 1 | -3/+9 |
| | |||||
* | fix issnl typo in pubmed | Bryan Newbold | 2020-07-23 | 1 | -1/+1 |
| | | | | | | | | | | Oh no! This bug may actually have had significant negative impact on metadata in fatcat, in terms of missing container_id associations with pubmed entities. There are about 500k release entities with a PMID but no container_id. Of those, 89k have at least a container_name. Unclear how many would have matched to ISSN-L and thus to a container. | ||||
* | lint (flake8) tool python files | Bryan Newbold | 2020-07-01 | 1 | -4/+2 |
| | |||||
* | importers: clarify handling of ApiException | Bryan Newbold | 2020-05-22 | 1 | -0/+1 |
| | | | | | | | | One of these (in ingest importer pipeline) is an actual bug, the others are just changing the syntax to be more explicit/conservative. The ingest importer bug seems to have resulted in some bad file match imports; scale of impact is unknown. | ||||
* | pubmed: use untranslated title if translated not available | Bryan Newbold | 2020-04-01 | 1 | -0/+6 |
| | | | | | | | The primary motivation for this change is that fatcat *requires* a non-empty title for each release entity. Pubmed/Medline occasionally indexes just a VenacularTitle with no ArticleTitle for foreign publications, and currently those records don't end up in fatcat at all. | ||||
* | importers: replace newlines in get_text() strings | Bryan Newbold | 2020-04-01 | 1 | -5/+7 |
| | |||||
* | pubmed: bunch of .get_text() instead of .string | Bryan Newbold | 2020-03-28 | 1 | -12/+12 |
| | | | | | | | | | | | Yikes! Apparently when a tag has child tags, .string will return None instead of all the strings. .get_text() returns all of it: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string I've things like identifiers as .string, when we expect only a single string inside. | ||||
* | pubmed: handle multiple ReferenceList | Bryan Newbold | 2020-03-20 | 1 | -1/+4 |
| | | | | | | | This resolves a situation noticed in prod where we were only importing/updating a single reference per article. Includes a regression test. | ||||
* | pubmed: update many more metadata fields | Bryan Newbold | 2020-03-19 | 1 | -0/+22 |
| | | | | | | | In particular, with daily updates in most cases the DOI will be registered first, then the entity updated with PMID when that is available. Often the pubmed metadata will be more complete, with abstracts etc, and we'll want those improvements. | ||||
* | importers: control update behavior with more-standard flag | Bryan Newbold | 2020-01-06 | 1 | -0/+4 |
| | |||||
* | pubmed: if doing update, also do subtitle schema update | Bryan Newbold | 2019-12-23 | 1 | -1/+9 |
| | |||||
* | pubmed: improve warning and stderr formatting | Bryan Newbold | 2019-12-23 | 1 | -5/+6 |
| | |||||
* | pubmed: use standard identifier cleaners | Bryan Newbold | 2019-12-23 | 1 | -17/+14 |
| | |||||
* | pubmed: remove unused extid mapping code | Bryan Newbold | 2019-12-23 | 1 | -29/+0 |
| | |||||
* | pubmed: do reference lookups by default | Bryan Newbold | 2019-12-23 | 1 | -1/+1 |
| | |||||
* | pubmed: null doi parsing check | Bryan Newbold | 2019-12-23 | 1 | -1/+1 |
| | |||||
* | add basic MedlineDate year parsing | Bryan Newbold | 2019-12-23 | 1 | -0/+11 |
| | |||||
* | refactor all python source for client lib name | Bryan Newbold | 2019-09-05 | 1 | -16/+16 |
| | |||||
* | more pubmed importer fixes | Bryan Newbold | 2019-06-03 | 1 | -6/+13 |
| | |||||
* | yet another pubmed weird DOI corner case | Bryan Newbold | 2019-05-29 | 1 | -1/+1 |
| | |||||
* | handle pubmed CollectiveName null-ness | Bryan Newbold | 2019-05-29 | 1 | -1/+1 |
| | |||||
* | handle empty retraction_of.PMID in pubmed importer | Bryan Newbold | 2019-05-29 | 1 | -2/+4 |
| | |||||
* | more MARC languages, and less verbose reporting | Bryan Newbold | 2019-05-24 | 1 | -1/+1 |
| | |||||
* | pubmed DOIs need strip() | Bryan Newbold | 2019-05-22 | 1 | -1/+1 |
| | |||||
* | pubmed: try to work around multi-edits | Bryan Newbold | 2019-05-22 | 1 | -3/+13 |
| | |||||
* | more strict pubmed DOI handling | Bryan Newbold | 2019-05-22 | 1 | -1/+3 |
| | |||||
* | more pubmed checks; handle PMID/DOI mismatch differently | Bryan Newbold | 2019-05-22 | 1 | -2/+7 |
| | |||||
* | all new importers need to set contrib index (order) | Bryan Newbold | 2019-05-22 | 1 | -0/+4 |
| | |||||
* | pubmed importer command and tweaks | Bryan Newbold | 2019-05-22 | 1 | -9/+227 |
| | |||||
* | importers: create containers by default | Bryan Newbold | 2019-05-21 | 1 | -1/+2 |
| | |||||
* | updates to pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -32/+60 |
| | |||||
* | fix lint issue in pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -1/+1 |
| | |||||
* | tweaks to new imports/tests | Bryan Newbold | 2019-05-21 | 1 | -6/+4 |
| | |||||
* | initial pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -0/+512 |