aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/harvest/oaipmh.py
Commit message (Collapse)AuthorAgeFilesLines
* typing: add annotations to remaining fatcat_tools codeBryan Newbold2021-11-031-4/+13
| | | | | Again, these are just annotations, no changes made to get type checks to pass
* fmt (black): fatcat_tools/Bryan Newbold2021-11-021-26/+31
|
* python: isort everythingBryan Newbold2021-11-021-1/+2
|
* arxiv: do retry five times of HTTP 503Martin Czygan2020-07-101-1/+1
|
* lint (flake8) tool python filesBryan Newbold2020-07-011-8/+0
|
* rename HarvestState.next() to HarvestState.next_span()Bryan Newbold2020-05-261-1/+1
| | | | | | | | | "span" short for "timespan" to harvest; there may be a better name to use. Motivation for this is to work around a pylint erorr that .next() was not callable. This might be a bug with pylint, but .next() is also a very generic name.
* HACK: skip pylint errors on lines that seem to be fineBryan Newbold2020-05-221-1/+1
| | | | | It seems to be an inadvertantly ugraded version of pylint saying that these lines are not-callable.
* oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorkerMartin Czygan2020-03-091-34/+0
|
* pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-191-0/+15
| | | | | | | * add PubmedFTPWorker * utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream) but may live elsewhere, as they are more generic * add KafkaBs4XmlPusher
* harvest: log state on startup and use stderr for diagnosticsMartin Czygan2020-02-141-7/+8
|
* review/fix all confluent-kafka produce codeBryan Newbold2019-09-201-5/+8
|
* small fixes to confluent-kafka importers/workersBryan Newbold2019-09-201-1/+1
| | | | | | | | - decrease default changelog pipeline to 5.0sec - fix missing KafkaException harvester imports - more confluent-kafka tweaks - updates to kafka consumer configs - bump elastic updates consumergroup (again)
* bump max message size to ~20 MBytesBryan Newbold2019-09-201-0/+1
|
* fixes to confluent-kafka harvestersBryan Newbold2019-09-201-8/+8
|
* first draft harvesters using confluent-kafkaBryan Newbold2019-09-201-11/+30
|
* MEDLINE/Pubmed noteBryan Newbold2019-03-151-2/+6
| | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch)
* bunch of lint/whitespace cleanupsBryan Newbold2019-02-221-6/+5
|
* clean up harvester comments/docsBryan Newbold2018-11-211-44/+29
|
* use isoformat() to format datesBryan Newbold2018-11-211-2/+2
| | | | This shouldn't change behavior; it's just more consistent.
* fix loop_sleep typoBryan Newbold2018-11-211-1/+1
|
* fix OAI-PMH name/finished messageBryan Newbold2018-11-211-1/+6
|
* fix oai-pmh issue againBryan Newbold2018-11-211-13/+14
|
* oaipmh: handle NoRecordsMatchBryan Newbold2018-11-211-5/+8
|
* initial OAI-PMH harvestersBryan Newbold2018-11-191-0/+157