Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pubmed: use untranslated title if translated not available | Bryan Newbold | 2020-04-01 | 1 | -0/+6 |
| | | | | | | | The primary motivation for this change is that fatcat *requires* a non-empty title for each release entity. Pubmed/Medline occasionally indexes just a VenacularTitle with no ArticleTitle for foreign publications, and currently those records don't end up in fatcat at all. | ||||
* | importers: replace newlines in get_text() strings | Bryan Newbold | 2020-04-01 | 1 | -5/+7 |
| | |||||
* | pubmed: bunch of .get_text() instead of .string | Bryan Newbold | 2020-03-28 | 1 | -12/+12 |
| | | | | | | | | | | | Yikes! Apparently when a tag has child tags, .string will return None instead of all the strings. .get_text() returns all of it: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string I've things like identifiers as .string, when we expect only a single string inside. | ||||
* | pubmed: handle multiple ReferenceList | Bryan Newbold | 2020-03-20 | 1 | -1/+4 |
| | | | | | | | This resolves a situation noticed in prod where we were only importing/updating a single reference per article. Includes a regression test. | ||||
* | pubmed: update many more metadata fields | Bryan Newbold | 2020-03-19 | 1 | -0/+22 |
| | | | | | | | In particular, with daily updates in most cases the DOI will be registered first, then the entity updated with PMID when that is available. Often the pubmed metadata will be more complete, with abstracts etc, and we'll want those improvements. | ||||
* | importers: control update behavior with more-standard flag | Bryan Newbold | 2020-01-06 | 1 | -0/+4 |
| | |||||
* | pubmed: if doing update, also do subtitle schema update | Bryan Newbold | 2019-12-23 | 1 | -1/+9 |
| | |||||
* | pubmed: improve warning and stderr formatting | Bryan Newbold | 2019-12-23 | 1 | -5/+6 |
| | |||||
* | pubmed: use standard identifier cleaners | Bryan Newbold | 2019-12-23 | 1 | -17/+14 |
| | |||||
* | pubmed: remove unused extid mapping code | Bryan Newbold | 2019-12-23 | 1 | -29/+0 |
| | |||||
* | pubmed: do reference lookups by default | Bryan Newbold | 2019-12-23 | 1 | -1/+1 |
| | |||||
* | pubmed: null doi parsing check | Bryan Newbold | 2019-12-23 | 1 | -1/+1 |
| | |||||
* | add basic MedlineDate year parsing | Bryan Newbold | 2019-12-23 | 1 | -0/+11 |
| | |||||
* | refactor all python source for client lib name | Bryan Newbold | 2019-09-05 | 1 | -16/+16 |
| | |||||
* | more pubmed importer fixes | Bryan Newbold | 2019-06-03 | 1 | -6/+13 |
| | |||||
* | yet another pubmed weird DOI corner case | Bryan Newbold | 2019-05-29 | 1 | -1/+1 |
| | |||||
* | handle pubmed CollectiveName null-ness | Bryan Newbold | 2019-05-29 | 1 | -1/+1 |
| | |||||
* | handle empty retraction_of.PMID in pubmed importer | Bryan Newbold | 2019-05-29 | 1 | -2/+4 |
| | |||||
* | more MARC languages, and less verbose reporting | Bryan Newbold | 2019-05-24 | 1 | -1/+1 |
| | |||||
* | pubmed DOIs need strip() | Bryan Newbold | 2019-05-22 | 1 | -1/+1 |
| | |||||
* | pubmed: try to work around multi-edits | Bryan Newbold | 2019-05-22 | 1 | -3/+13 |
| | |||||
* | more strict pubmed DOI handling | Bryan Newbold | 2019-05-22 | 1 | -1/+3 |
| | |||||
* | more pubmed checks; handle PMID/DOI mismatch differently | Bryan Newbold | 2019-05-22 | 1 | -2/+7 |
| | |||||
* | all new importers need to set contrib index (order) | Bryan Newbold | 2019-05-22 | 1 | -0/+4 |
| | |||||
* | pubmed importer command and tweaks | Bryan Newbold | 2019-05-22 | 1 | -9/+227 |
| | |||||
* | importers: create containers by default | Bryan Newbold | 2019-05-21 | 1 | -1/+2 |
| | |||||
* | updates to pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -32/+60 |
| | |||||
* | fix lint issue in pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -1/+1 |
| | |||||
* | tweaks to new imports/tests | Bryan Newbold | 2019-05-21 | 1 | -6/+4 |
| | |||||
* | initial pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -0/+512 |