diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2019-12-09 01:03:43 +0100 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2019-12-28 23:07:31 +0100 |
commit | 4a82a0763bf927248f22e47ab5187af4beff83ee (patch) | |
tree | af86801bfb77a40bc8b409fa736b40c581fe970c /python/tests/import_datacite.py | |
parent | 54a2c83c0a5e8ccd4eec7c18eac715bdbb3eb62e (diff) | |
download | fatcat-4a82a0763bf927248f22e47ab5187af4beff83ee.tar.gz fatcat-4a82a0763bf927248f22e47ab5187af4beff83ee.zip |
datacite: importer skeleton
* contributors, title, date, publisher, container, license
Field and value analysis via https://github.com/miku/indigo.
Diffstat (limited to 'python/tests/import_datacite.py')
-rw-r--r-- | python/tests/import_datacite.py | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/python/tests/import_datacite.py b/python/tests/import_datacite.py new file mode 100644 index 00000000..0bbaba2e --- /dev/null +++ b/python/tests/import_datacite.py @@ -0,0 +1,25 @@ +""" +Test datacite importer. + +Datacite is a aggregator, hence inputs are quite varied. + +Here is small sample of ID types taken from a sample: + + 497344 "DOI" + 65013 "URL" + 22210 "CCDC" + 17853 "GBIF" + 17635 "Other" + 11474 "uri" + 9170 "Publisher ID" + 7775 "URN" + 6196 "DUCHAS" + 5624 "Handle" + 5056 "publisherId" + +A nice tool, not yet existing tool (maybe named indigo) would do the following: + + $ shuf -n 100000 datacite.ndjson | indigo -t md > data.md + +TODO(martin): Write tests. +""" |