summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/harvest/pubmed.py
Commit message (Collapse)AuthorAgeFilesLines
* rename HarvestState.next() to HarvestState.next_span()Bryan Newbold2020-05-261-1/+1
| | | | | | | | | "span" short for "timespan" to harvest; there may be a better name to use. Motivation for this is to work around a pylint erorr that .next() was not callable. This might be a bug with pylint, but .next() is also a very generic name.
* HACK: skip pylint errors on lines that seem to be fineBryan Newbold2020-05-221-1/+1
| | | | | It seems to be an inadvertantly ugraded version of pylint saying that these lines are not-callable.
* pubmed: log to stderrMartin Czygan2020-03-101-1/+1
|
* pubmed: move mapping generation out of fetch_dateMartin Czygan2020-03-101-7/+8
| | | | | * fetch_date will fail on missing mapping * adjust tests (test will require access to pubmed ftp)
* pubmed: citations is a bit more preciseMartin Czygan2020-03-091-1/+1
| | | | | > Each day, NLM produces update files that include new, revised and deleted citations. -- ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/README.txt
* pubmed: we sync from FTPMartin Czygan2020-03-091-1/+1
|
* more pubmed adjustmentsMartin Czygan2020-02-221-70/+117
| | | | | * regenerate map in continuous mode * add tests
* pubmed ftp: fix urlMartin Czygan2020-02-191-4/+6
|
* pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-191-0/+199
* add PubmedFTPWorker * utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream) but may live elsewhere, as they are more generic * add KafkaBs4XmlPusher