<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/python/fatcat_tools/harvest, branch v0.4.0</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=v0.4.0</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=v0.4.0'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2021-09-09T19:20:19Z</updated>
<entry>
<title>pubmed: workaround a networking issue</title>
<updated>2021-09-09T19:20:19Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-09-09T18:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=23c9206670bdad61d09d89622393e16e5f11fdca'/>
<id>urn:sha1:23c9206670bdad61d09d89622393e16e5f11fdca</id>
<content type='text'>
use an http proxy (https://github.com/miku/ftpup) to fetch files from
FTP, keep some retry logic; also, hardcoding the proxy path as this
should be a temporary workaround
</content>
</entry>
<entry>
<title>pubmed: add option to ftp download with lftp</title>
<updated>2021-09-08T20:02:48Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-08-30T21:23:28Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=6d9d67c9c4d1a0b208fc2056ab485a1c8d21e100'/>
<id>urn:sha1:6d9d67c9c4d1a0b208fc2056ab485a1c8d21e100</id>
<content type='text'>
lftp is a classic command line ftp client, and we hope that its retry
capabilities are enough of a workaround for the current networking issue
</content>
</entry>
<entry>
<title>pubmed harvester: add basic retry logic</title>
<updated>2021-08-20T20:32:19Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-08-20T20:32:19Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=a4352a003a9fc7085638268ff00c05e305c519f5'/>
<id>urn:sha1:a4352a003a9fc7085638268ff00c05e305c519f5</id>
<content type='text'>
Related to a previous issue with seemingly random EOFError from FTP
connections, this patch wrap "ftpretr" helper function with a basic
retry.

Refs: fatcat-workers/issues/92151, fatcat-workers/issues/91102
</content>
</entry>
<entry>
<title>pubmed: update docs</title>
<updated>2021-07-16T23:38:29Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-07-16T23:38:29Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=0202f5f9d0c508e2c4cc4af6a8b22bd624bcbd0b'/>
<id>urn:sha1:0202f5f9d0c508e2c4cc4af6a8b22bd624bcbd0b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>pubmed: do not fail when accessing missing file</title>
<updated>2021-07-16T23:29:16Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-07-16T23:29:16Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=6056dcc93fa8111a04d76a7af5bcddb12704cb96'/>
<id>urn:sha1:6056dcc93fa8111a04d76a7af5bcddb12704cb96</id>
<content type='text'>
after a sync gap (e.g. 06/07 2021) harvester wanted to fetch a file,
that was not on the server (any more) - do not fail in this case

we'll need to backfill missing records via full data dump
</content>
</entry>
<entry>
<title>pubmed: reconnect on error</title>
<updated>2021-07-16T21:45:48Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-07-16T12:42:30Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=47c98540083da302802b54a70f77fb8abc69b4de'/>
<id>urn:sha1:47c98540083da302802b54a70f77fb8abc69b4de</id>
<content type='text'>
ftp retrieval would run but fail with EOFError on
/pubmed/updatefiles/pubmed21n1328_stats.html - not able to find the root
cause; using a fresh client, the exact same file would work just
fine. So when we retry, we reconnect on failure.

Refs: sentry #91102.
</content>
</entry>
<entry>
<title>small python lint fixes (no behavior change)</title>
<updated>2021-05-25T23:30:09Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-05-25T23:30:09Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=7aa19b92f0e7808a341aec8ae17485408dfae68c'/>
<id>urn:sha1:7aa19b92f0e7808a341aec8ae17485408dfae68c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>harvest: datacite API yields HTTP 200 with broken JSON</title>
<updated>2020-08-10T17:58:12Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-08-10T17:55:14Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=e18d48642cecb55d9f2270f9048953a7b543472e'/>
<id>urn:sha1:e18d48642cecb55d9f2270f9048953a7b543472e</id>
<content type='text'>
As a first step: log response body for debugging.
</content>
</entry>
<entry>
<title>arxiv: do retry five times of HTTP 503</title>
<updated>2020-07-09T22:54:55Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-07-09T22:54:55Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=c403cb4a1f20bd056008f68f71b374bde1e089b5'/>
<id>urn:sha1:c403cb4a1f20bd056008f68f71b374bde1e089b5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>lint (flake8) tool python files</title>
<updated>2020-07-02T01:35:24Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-07-02T01:35:24Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=30905f1effb33c3ef193d084120aa3fbd75d0b9b'/>
<id>urn:sha1:30905f1effb33c3ef193d084120aa3fbd75d0b9b</id>
<content type='text'>
</content>
</entry>
</feed>
