Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | arxiv: do retry five times of HTTP 503 | Martin Czygan | 2020-07-10 | 1 | -1/+1 |
| | |||||
* | lint (flake8) tool python files | Bryan Newbold | 2020-07-01 | 1 | -8/+0 |
| | |||||
* | rename HarvestState.next() to HarvestState.next_span() | Bryan Newbold | 2020-05-26 | 1 | -1/+1 |
| | | | | | | | | | "span" short for "timespan" to harvest; there may be a better name to use. Motivation for this is to work around a pylint erorr that .next() was not callable. This might be a bug with pylint, but .next() is also a very generic name. | ||||
* | HACK: skip pylint errors on lines that seem to be fine | Bryan Newbold | 2020-05-22 | 1 | -1/+1 |
| | | | | | It seems to be an inadvertantly ugraded version of pylint saying that these lines are not-callable. | ||||
* | oaipmh: HarvestPubmedWorker obsoleted by PubmedFTPWorker | Martin Czygan | 2020-03-09 | 1 | -34/+0 |
| | |||||
* | pubmed ftp harvest and KafkaBs4XmlPusher | Martin Czygan | 2020-02-19 | 1 | -0/+15 |
| | | | | | | | * add PubmedFTPWorker * utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream) but may live elsewhere, as they are more generic * add KafkaBs4XmlPusher | ||||
* | harvest: log state on startup and use stderr for diagnostics | Martin Czygan | 2020-02-14 | 1 | -7/+8 |
| | |||||
* | review/fix all confluent-kafka produce code | Bryan Newbold | 2019-09-20 | 1 | -5/+8 |
| | |||||
* | small fixes to confluent-kafka importers/workers | Bryan Newbold | 2019-09-20 | 1 | -1/+1 |
| | | | | | | | | - decrease default changelog pipeline to 5.0sec - fix missing KafkaException harvester imports - more confluent-kafka tweaks - updates to kafka consumer configs - bump elastic updates consumergroup (again) | ||||
* | bump max message size to ~20 MBytes | Bryan Newbold | 2019-09-20 | 1 | -0/+1 |
| | |||||
* | fixes to confluent-kafka harvesters | Bryan Newbold | 2019-09-20 | 1 | -8/+8 |
| | |||||
* | first draft harvesters using confluent-kafka | Bryan Newbold | 2019-09-20 | 1 | -11/+30 |
| | |||||
* | MEDLINE/Pubmed note | Bryan Newbold | 2019-03-15 | 1 | -2/+6 |
| | | | | Also, arXivRaw, not arXiv (though see WIP on more-importers branch) | ||||
* | bunch of lint/whitespace cleanups | Bryan Newbold | 2019-02-22 | 1 | -6/+5 |
| | |||||
* | clean up harvester comments/docs | Bryan Newbold | 2018-11-21 | 1 | -44/+29 |
| | |||||
* | use isoformat() to format dates | Bryan Newbold | 2018-11-21 | 1 | -2/+2 |
| | | | | This shouldn't change behavior; it's just more consistent. | ||||
* | fix loop_sleep typo | Bryan Newbold | 2018-11-21 | 1 | -1/+1 |
| | |||||
* | fix OAI-PMH name/finished message | Bryan Newbold | 2018-11-21 | 1 | -1/+6 |
| | |||||
* | fix oai-pmh issue again | Bryan Newbold | 2018-11-21 | 1 | -13/+14 |
| | |||||
* | oaipmh: handle NoRecordsMatch | Bryan Newbold | 2018-11-21 | 1 | -5/+8 |
| | |||||
* | initial OAI-PMH harvesters | Bryan Newbold | 2018-11-19 | 1 | -0/+157 |