<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/python/fatcat_tools/harvest, branch v0.3.3</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=v0.3.3</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=v0.3.3'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2020-08-10T17:58:12Z</updated>
<entry>
<title>harvest: datacite API yields HTTP 200 with broken JSON</title>
<updated>2020-08-10T17:58:12Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-08-10T17:55:14Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=e18d48642cecb55d9f2270f9048953a7b543472e'/>
<id>urn:sha1:e18d48642cecb55d9f2270f9048953a7b543472e</id>
<content type='text'>
As a first step: log response body for debugging.
</content>
</entry>
<entry>
<title>arxiv: do retry five times of HTTP 503</title>
<updated>2020-07-09T22:54:55Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-07-09T22:54:55Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=c403cb4a1f20bd056008f68f71b374bde1e089b5'/>
<id>urn:sha1:c403cb4a1f20bd056008f68f71b374bde1e089b5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>lint (flake8) tool python files</title>
<updated>2020-07-02T01:35:24Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-07-02T01:35:24Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=30905f1effb33c3ef193d084120aa3fbd75d0b9b'/>
<id>urn:sha1:30905f1effb33c3ef193d084120aa3fbd75d0b9b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>harvest: fail on HTTP 400</title>
<updated>2020-05-29T17:07:50Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-05-29T17:00:30Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=34a64b5d8c470ae2627458d791239cfc4d66d6b3'/>
<id>urn:sha1:34a64b5d8c470ae2627458d791239cfc4d66d6b3</id>
<content type='text'>
In the past harvest of datacite resulted in occasional HTTP 400.
Meanwhile, various API bugs have been fixed (most recently:
https://github.com/datacite/lupo/pull/537,
https://github.com/datacite/datacite/issues/1038). Downside of ignoring
this error was that state lives in kafka, which has limited support for
deletion of arbitrary messages from a topic.
</content>
</entry>
<entry>
<title>rename HarvestState.next() to HarvestState.next_span()</title>
<updated>2020-05-27T02:09:55Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-05-27T02:01:28Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=670aed3800873869550b477846f48cb2b4193005'/>
<id>urn:sha1:670aed3800873869550b477846f48cb2b4193005</id>
<content type='text'>
"span" short for "timespan" to harvest; there may be a better name to
use.

Motivation for this is to work around a pylint erorr that .next() was
not callable. This might be a bug with pylint, but .next() is also a
very generic name.
</content>
</entry>
<entry>
<title>HACK: skip pylint errors on lines that seem to be fine</title>
<updated>2020-05-22T23:52:40Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-05-22T23:52:38Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=99dfb5031511eb730c3397b18f2bd6e537b67e9a'/>
<id>urn:sha1:99dfb5031511eb730c3397b18f2bd6e537b67e9a</id>
<content type='text'>
It seems to be an inadvertantly ugraded version of pylint saying that
these lines are not-callable.
</content>
</entry>
<entry>
<title>crossref: switch from index-date to update-date</title>
<updated>2020-03-31T04:23:11Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-03-31T03:56:04Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=851c40143d44a73a92ff2c9556b3a63f29668c2d'/>
<id>urn:sha1:851c40143d44a73a92ff2c9556b3a63f29668c2d</id>
<content type='text'>
This goes against what the API docs recommend, but we are currently far
behind on updates and need to catch up. Other than what the docs say,
this seems to be consistent with the behavior we want.
</content>
</entry>
<entry>
<title>crossref: longer comment about crossref API date fields</title>
<updated>2020-03-31T03:55:44Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-03-31T03:55:44Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=98933a068ec3d918deb0e7dff30aed517ca515d9'/>
<id>urn:sha1:98933a068ec3d918deb0e7dff30aed517ca515d9</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge pull request #53 from EdwardBetts/spelling</title>
<updated>2020-03-27T23:50:08Z</updated>
<author>
<name>bnewbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-03-27T23:50:08Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=98abe2e751187aa7c2e751b355ffb56d9b1f8c6a'/>
<id>urn:sha1:98abe2e751187aa7c2e751b355ffb56d9b1f8c6a</id>
<content type='text'>
Correct spelling mistakes</content>
</entry>
<entry>
<title>Correct spelling mistakes</title>
<updated>2020-03-27T21:25:54Z</updated>
<author>
<name>Edward Betts</name>
<email>edward@4angle.com</email>
</author>
<published>2020-03-27T21:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=94710b2803780ab16fb30b79010f8e27cf115512'/>
<id>urn:sha1:94710b2803780ab16fb30b79010f8e27cf115512</id>
<content type='text'>
</content>
</entry>
</feed>
