| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
use an http proxy (https://github.com/miku/ftpup) to fetch files from
FTP, keep some retry logic; also, hardcoding the proxy path as this
should be a temporary workaround
|
|
|
|
|
| |
lftp is a classic command line ftp client, and we hope that its retry
capabilities are enough of a workaround for the current networking issue
|
|
|
|
|
|
|
|
| |
Related to a previous issue with seemingly random EOFError from FTP
connections, this patch wrap "ftpretr" helper function with a basic
retry.
Refs: fatcat-workers/issues/92151, fatcat-workers/issues/91102
|
| |
|
|
|
|
|
|
|
| |
after a sync gap (e.g. 06/07 2021) harvester wanted to fetch a file,
that was not on the server (any more) - do not fail in this case
we'll need to backfill missing records via full data dump
|
|
|
|
|
|
|
|
|
| |
ftp retrieval would run but fail with EOFError on
/pubmed/updatefiles/pubmed21n1328_stats.html - not able to find the root
cause; using a fresh client, the exact same file would work just
fine. So when we retry, we reconnect on failure.
Refs: sentry #91102.
|
| |
|
|
|
|
|
|
|
|
|
| |
"span" short for "timespan" to harvest; there may be a better name to
use.
Motivation for this is to work around a pylint erorr that .next() was
not callable. This might be a bug with pylint, but .next() is also a
very generic name.
|
|
|
|
|
| |
It seems to be an inadvertantly ugraded version of pylint saying that
these lines are not-callable.
|
| |
|
|
|
|
|
| |
* fetch_date will fail on missing mapping
* adjust tests (test will require access to pubmed ftp)
|
|
|
|
|
| |
> Each day, NLM produces update files that include new, revised and
deleted citations. -- ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/README.txt
|
| |
|
|
|
|
|
| |
* regenerate map in continuous mode
* add tests
|
| |
|
|
* add PubmedFTPWorker
* utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream)
but may live elsewhere, as they are more generic
* add KafkaBs4XmlPusher
|