diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2019-12-06 15:12:32 +0100 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2019-12-27 00:13:40 +0100 |
commit | fd50b9492b5fdf3c94f11dea909d63b4b60866b2 (patch) | |
tree | fb2185e5b520a0e262d8154015023f586fca22ad /python/fatcat_tools/harvest | |
parent | 27d79252aa60379c3dc45b4d6072b21a9f82b8c1 (diff) | |
download | fatcat-fd50b9492b5fdf3c94f11dea909d63b4b60866b2.tar.gz fatcat-fd50b9492b5fdf3c94f11dea909d63b4b60866b2.zip |
Datacite API v2 throws 400, we cannot recover from, currently.
As a first iteration, just mark the daily batch complete and continue.
The occasional HTTP 400 issue has been reported as
https://github.com/datacite/datacite/issues/897.
A possible improvement would be to shrink the window, so losses will be
smaller.
Diffstat (limited to 'python/fatcat_tools/harvest')
-rw-r--r-- | python/fatcat_tools/harvest/doi_registrars.py | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/python/fatcat_tools/harvest/doi_registrars.py b/python/fatcat_tools/harvest/doi_registrars.py index 5af5395e..19b32e18 100644 --- a/python/fatcat_tools/harvest/doi_registrars.py +++ b/python/fatcat_tools/harvest/doi_registrars.py @@ -122,6 +122,10 @@ class HarvestCrossrefWorker: self.producer.poll(0) time.sleep(30.0) continue + if http_resp.status_code == 400: + # https://is.gd/0nsEll, https://github.com/datacite/datacite/issues/897 + print("skipping batch for {}, due to HTTP 400. Marking complete. Related: https://git.io/JeylE".format(date_str)) + break http_resp.raise_for_status() resp = http_resp.json() items = self.extract_items(resp) |