summaryrefslogtreecommitdiffstats
path: root/notes/old_imports.txt
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-09-22 19:08:18 -0700
committerBryan Newbold <bnewbold@robocracy.org>2018-09-22 19:08:18 -0700
commit91eb3a7a9e7fdb1b344462d5bfb3e826320dc431 (patch)
treeaa56b19199df44e91eb4193711a9d39d5ef7dc73 /notes/old_imports.txt
parentb12158b396bd849f40ff6713ad7836f3293f4029 (diff)
downloadfatcat-91eb3a7a9e7fdb1b344462d5bfb3e826320dc431.tar.gz
fatcat-91eb3a7a9e7fdb1b344462d5bfb3e826320dc431.zip
commit old notes and other files
Diffstat (limited to 'notes/old_imports.txt')
-rw-r--r--notes/old_imports.txt20
1 files changed, 20 insertions, 0 deletions
diff --git a/notes/old_imports.txt b/notes/old_imports.txt
new file mode 100644
index 00000000..1233d4a8
--- /dev/null
+++ b/notes/old_imports.txt
@@ -0,0 +1,20 @@
+
+## ORCID
+
+Directly from compressed tarball; takes about 2 hours in production:
+
+ tar xf /srv/datasets/public_profiles_API-2.0_2017_10_json.tar.gz -O | jq -c . | grep '"person":' | time parallel -j12 --pipe --round-robin ./fatcat_import.py import-orcid -
+
+After tuning database, `jq` CPU seems to be bottleneck, so, from pre-extracted
+tarball:
+
+ tar xf /srv/datasets/public_profiles_API-2.0_2017_10_json.tar.gz -O | jq -c . | rg '"person":' > /srv/datasets/public_profiles_1_2_json.all.json
+ time parallel --bar --pipepart -j8 -a /srv/datasets/public_profiles_1_2_json.all.json ./fatcat_import.py import-orcid -
+
+Does not work:
+
+ ./fatcat_import.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json
+
+Instead:
+
+ cat /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json | jq -c . | ./fatcat_import.py import-orcid -