summaryrefslogtreecommitdiffstats
path: root/python/tests/import_datacite.py
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2019-12-09 01:03:43 +0100
committerMartin Czygan <martin.czygan@gmail.com>2019-12-28 23:07:31 +0100
commit4a82a0763bf927248f22e47ab5187af4beff83ee (patch)
treeaf86801bfb77a40bc8b409fa736b40c581fe970c /python/tests/import_datacite.py
parent54a2c83c0a5e8ccd4eec7c18eac715bdbb3eb62e (diff)
downloadfatcat-4a82a0763bf927248f22e47ab5187af4beff83ee.tar.gz
fatcat-4a82a0763bf927248f22e47ab5187af4beff83ee.zip
datacite: importer skeleton
* contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo.
Diffstat (limited to 'python/tests/import_datacite.py')
-rw-r--r--python/tests/import_datacite.py25
1 files changed, 25 insertions, 0 deletions
diff --git a/python/tests/import_datacite.py b/python/tests/import_datacite.py
new file mode 100644
index 00000000..0bbaba2e
--- /dev/null
+++ b/python/tests/import_datacite.py
@@ -0,0 +1,25 @@
+"""
+Test datacite importer.
+
+Datacite is a aggregator, hence inputs are quite varied.
+
+Here is small sample of ID types taken from a sample:
+
+ 497344 "DOI"
+ 65013 "URL"
+ 22210 "CCDC"
+ 17853 "GBIF"
+ 17635 "Other"
+ 11474 "uri"
+ 9170 "Publisher ID"
+ 7775 "URN"
+ 6196 "DUCHAS"
+ 5624 "Handle"
+ 5056 "publisherId"
+
+A nice tool, not yet existing tool (maybe named indigo) would do the following:
+
+ $ shuf -n 100000 datacite.ndjson | indigo -t md > data.md
+
+TODO(martin): Write tests.
+"""