aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2019-12-06 15:12:32 +0100
committerMartin Czygan <martin.czygan@gmail.com>2019-12-27 00:13:40 +0100
commitfd50b9492b5fdf3c94f11dea909d63b4b60866b2 (patch)
treefb2185e5b520a0e262d8154015023f586fca22ad
parent27d79252aa60379c3dc45b4d6072b21a9f82b8c1 (diff)
downloadfatcat-fd50b9492b5fdf3c94f11dea909d63b4b60866b2.tar.gz
fatcat-fd50b9492b5fdf3c94f11dea909d63b4b60866b2.zip
Datacite API v2 throws 400, we cannot recover from, currently.
As a first iteration, just mark the daily batch complete and continue. The occasional HTTP 400 issue has been reported as https://github.com/datacite/datacite/issues/897. A possible improvement would be to shrink the window, so losses will be smaller.
-rw-r--r--python/fatcat_tools/harvest/doi_registrars.py4
1 files changed, 4 insertions, 0 deletions
diff --git a/python/fatcat_tools/harvest/doi_registrars.py b/python/fatcat_tools/harvest/doi_registrars.py
index 5af5395e..19b32e18 100644
--- a/python/fatcat_tools/harvest/doi_registrars.py
+++ b/python/fatcat_tools/harvest/doi_registrars.py
@@ -122,6 +122,10 @@ class HarvestCrossrefWorker:
self.producer.poll(0)
time.sleep(30.0)
continue
+ if http_resp.status_code == 400:
+ # https://is.gd/0nsEll, https://github.com/datacite/datacite/issues/897
+ print("skipping batch for {}, due to HTTP 400. Marking complete. Related: https://git.io/JeylE".format(date_str))
+ break
http_resp.raise_for_status()
resp = http_resp.json()
items = self.extract_items(resp)