diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2018-06-09 00:59:33 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-06-09 00:59:33 -0700 |
commit | 0f6354ffbdf7115f8a6d7e4d3ea700a44fe567ed (patch) | |
tree | d6fb3d094dd64f51b0d4723a0c30112b89b7c3d7 /python/README_import.md | |
parent | ce0c06ca8e694362a3bf4cde175efbe1af6e4962 (diff) | |
download | fatcat-0f6354ffbdf7115f8a6d7e4d3ea700a44fe567ed.tar.gz fatcat-0f6354ffbdf7115f8a6d7e4d3ea700a44fe567ed.zip |
fixes to orcid importer for larger batches
Diffstat (limited to 'python/README_import.md')
-rw-r--r-- | python/README_import.md | 31 |
1 files changed, 31 insertions, 0 deletions
diff --git a/python/README_import.md b/python/README_import.md new file mode 100644 index 00000000..11cb0fd8 --- /dev/null +++ b/python/README_import.md @@ -0,0 +1,31 @@ + +## ORCID + +Does not work: + + ./client.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json + +Instead: + + cat /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json | jq -c . | ./client.py import-orcid - + +Or for many files: + + find /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3 -iname '*.json' | parallel --bar jq -c . {} | rg '"person":' | ./client.py import-orcid - + + +for ~9k files: + + (python-B2RYrks8) bnewbold@orithena$ time parallel --pipepart -j4 -a /data/orcid/partial/public_profiles_API-2.0_2017_10_json/all.json ./client.py import-orcid - + real 0m15.294s + user 0m28.112s + sys 0m2.408s + + => 636/second + + (python-B2RYrks8) bnewbold@orithena$ time ./client.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/all.json + real 0m47.268s + user 0m2.616s + sys 0m0.104s + + => 203/second |