3 files changed, 64 insertions, 13 deletions
diff --git a/notes/bulk_edits/2019-12-20_orcid.md b/notes/bulk_edits/2019-12-20_orcid.md
new file mode 100644
index 00000000..33dde32f
--- /dev/null
+++ b/notes/bulk_edits/2019-12-20_orcid.md
@@ -0,0 +1,43 @@
+
+Newer ORCID dumps are XML, not JSON. But there is a conversion tool!
+
+    https://github.com/ORCID/orcid-conversion-lib
+
+Commands:
+
+    wget https://github.com/ORCID/orcid-conversion-lib/raw/master/target/orcid-conversion-lib-0.0.2-full.jar
+    java -jar orcid-conversion-lib-0.0.2-full.jar OPTIONS
+
+    java -jar orcid-conversion-lib-0.0.2-full.jar --tarball -i ORCID_2019_summaries.tar.gz -v v3_0rc1 -o ORCID_2019_summaries_json.tar.gz
+
+    # [...]
+    # Sat Dec 21 04:43:50 UTC 2019 done 7300000
+    # Sat Dec 21 04:44:08 UTC 2019 done 7310000
+    # Sat Dec 21 04:44:17 UTC 2019 finished  errors 0
+
+Importing in QA, ran in to some lines like:
+
+    {"response-code":409,"developer-message":"409 Conflict: The ORCID record is locked and cannot be edited. ORCID https://orcid.org/0000-0003-0014-6598","user-message":"The ORCID record is locked.","error-code":9018,"more-info":"https://members.orcid.org/api/resources/troubleshooting"}
+    {"response-code":409,"developer-message":"409 Conflict: The ORCID record is locked and cannot be edited. ORCID https://orcid.org/0000-0003-3750-5654","user-message":"The ORCID record is locked.","error-code":9018,"more-info":"https://members.orcid.org/api/resources/troubleshooting"}
+    {"response-code":409,"developer-message":"409 Conflict: The ORCID record is locked and cannot be edited. ORCID https://orcid.org/0000-0003-1424-4826","user-message":"The ORCID record is locked.","error-code":9018,"more-info":"https://members.orcid.org/api/resources/troubleshooting"}
+    {"response-code":409,"developer-message":"409 Conflict: The ORCID record is locked and cannot be edited. ORCID https://orcid.org/0000-0002-5340-9665","user-message":"The ORCID record is locked.","error-code":9018,"more-info":"https://members.orcid.org/api/resources/troubleshooting"}
+
+Needed to patch to filter those out. Then ran ok like:
+
+    zcat /srv/fatcat/datasets/ORCID_2019_summaries.sample_10k.json.gz | ./fatcat_import.py orcid -
+    Counter({'total': 10000, 'exists': 5323, 'insert': 4493, 'skip': 184, 'skip-no-person': 160, 'update': 0})
+
+New dump is about 7.3 million rows, so expecting about 3.2 million new
+entities, 250k skips.
+
+Doing bulk run like:
+
+    time zcat /srv/fatcat/datasets/ORCID_2019_summaries.json.gz | parallel -j8 --round-robin --pipe ./fatcat_import.py orcid -
+
+Prod timing:
+
+    Counter({'total': 910643, 'exists': 476812, 'insert': 416583, 'skip': 17248, 'update': 0})
+
+    real    47m27.658s
+    user    245m44.272s
+    sys     14m50.836s
diff --git a/notes/bulk_edits/2019-12-20_updates.md b/notes/bulk_edits/2019-12-20_updates.md
index a8f62ea9..83c8d9da 100644
--- a/notes/bulk_edits/2019-12-20_updates.md
+++ b/notes/bulk_edits/2019-12-20_updates.md
@@ -80,3 +80,13 @@ x fix bad DOI error (real error, skip these)
 x remove newline after "unparsable medline date" error
 x remove extra line like "existing.ident, existing.ext_ids.pmid, re.ext_ids.pmid))" in warning
 
+## Chocula
+
+Command:
+
+    export FATCAT_AUTH_WORKER_JOURNAL_METADATA=[...]
+    ./fatcat_import.py chocula /srv/fatcat/datasets/export_fatcat.2019-12-26.json
+
+Result:
+
+    Counter({'total': 144455, 'exists': 139807, 'insert': 2384, 'skip': 2264, 'skip-unknown-new-issnl': 2264, 'exists-by-issnl': 306, 'update': 0})
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md
index 80760938..2db0c72d 100644
--- a/notes/bulk_edits/CHANGELOG.md
+++ b/notes/bulk_edits/CHANGELOG.md
@@ -9,8 +9,19 @@ this file should probably get merged into the guide at some point.
 
 This file should not turn in to a TODO list!
 
+## 2020-01
+
+Imported around 2,500 new containers (journals, by ISSN-L) from chocula
+analysis script.
+
 ## 2019-12
 
+Started continuous harvesting Datacite DOI metadata; first date harvested was
+`2019-12-13`. No importer running yet.
+
+Imported about 3.3m new ORCID identifiers from 2019 bulk dump (after converting
+from XML to JSON): <https://archive.org/details/orcid-dump-2019>
+
 Inserted about 154k new arxiv release entities. Still no automatic daily
 harvesting.
 
@@ -45,22 +56,9 @@ invalid ISSN checksum).
 Imported files (matched to releases by DOI) from Semantic Scholar
 (`DIRECT-OA-CRAWL-2019` crawl).
 
-    Arabesque importer
-    crawl-bot
-    `s2_doi.sqlite`
-    TODO: archive.org link
-    TODO: rough count
-    TODO: date
-
 Imported files (matched to releases by DOI) from pre-1923/pre-1909 items uploaded
 by a user to archive.org.
 
-    Matched importer
-    internetarchive-bot (TODO:)
-    TODO: archive.org link
-    TODO: counts
-    TODO: date
-
 Imported files (matched to releases by DOI) from CORE.ac.uk
 (`DIRECT-OA-CRAWL-2019` crawl).