Import of a 2022-04 JALC DOI metadata snapshot. Note that we had downloaded a prior 2021-04 snapshot, but don't seem to have ever imported it. ## Download and Archive URL for bulk snapshot is available at the bottom of this page: More info: wget 'https://japanlinkcenter.org/lod/JALC-LOD-20220401.gz?jalcmetadatadl_1703' wget 'http://japanlinkcenter.org/top/doc/JaLC_LOD_format.pdf' wget 'http://japanlinkcenter.org/top/doc/JaLC_LOD_sample.pdf' mv 'JALC-LOD-20220401.gz?jalcmetadatadl_1703' JALC-LOD-20220401.gz ia upload jalc-bulk-metadata-2022-04 -m collection:ia_biblio_metadata jalc_logo.png JALC-LOD-20220401.gz JaLC_LOD_format.pdf JaLC_LOD_sample.pdf ## Import As of 2022-07-19, 6,502,202 release hits for `doi_registrar:jalc`. Re-download the file: cd /srv/fatcat/datasets wget 'https://archive.org/download/jalc-bulk-metadata-2022-04/JALC-LOD-20220401.gz' gunzip JALC-LOD-20220401.gz cd /srv/fatcat/src/python wc -l /srv/fatcat/datasets/JALC-LOD-20220401 9525225 Start with some samples: export FATCAT_AUTH_WORKER_JALC=[...] shuf -n100 /srv/fatcat/datasets/JALC-LOD-20220401 | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt # Counter({'total': 100, 'exists': 89, 'insert': 11, 'skip': 0, 'update': 0}) Full import (single threaded): cat /srv/fatcat/datasets/JALC-LOD-20220401 | pv -l | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt # 9.53M 22:26:06 [ 117 /s] # Counter({'total': 9510096, 'exists': 8589731, 'insert': 915032, 'skip': 5333, 'inserted.container': 119, 'update': 0}) Wow, almost a million new releases! 7,417,245 results for `doi_registrar:jalc`.