1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
Import of a 2022-04 JALC DOI metadata snapshot.
Note that we had downloaded a prior 2021-04 snapshot, but don't seem to have
ever imported it.
## Download and Archive
URL for bulk snapshot is available at the bottom of this page: <https://form.jst.go.jp/enquetes/jalcmetadatadl_1703>
More info: <http://japanlinkcenter.org/top/service/service_data.html>
wget 'https://japanlinkcenter.org/lod/JALC-LOD-20220401.gz?jalcmetadatadl_1703'
wget 'http://japanlinkcenter.org/top/doc/JaLC_LOD_format.pdf'
wget 'http://japanlinkcenter.org/top/doc/JaLC_LOD_sample.pdf'
mv 'JALC-LOD-20220401.gz?jalcmetadatadl_1703' JALC-LOD-20220401.gz
ia upload jalc-bulk-metadata-2022-04 -m collection:ia_biblio_metadata jalc_logo.png JALC-LOD-20220401.gz JaLC_LOD_format.pdf JaLC_LOD_sample.pdf
## Import
As of 2022-07-19, 6,502,202 release hits for `doi_registrar:jalc`.
Re-download the file:
cd /srv/fatcat/datasets
wget 'https://archive.org/download/jalc-bulk-metadata-2022-04/JALC-LOD-20220401.gz'
gunzip JALC-LOD-20220401.gz
cd /srv/fatcat/src/python
wc -l /srv/fatcat/datasets/JALC-LOD-20220401
9525225
Start with some samples:
export FATCAT_AUTH_WORKER_JALC=[...]
shuf -n100 /srv/fatcat/datasets/JALC-LOD-20220401 | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt
# Counter({'total': 100, 'exists': 89, 'insert': 11, 'skip': 0, 'update': 0})
Full import (single threaded):
cat /srv/fatcat/datasets/JALC-LOD-20220401 | pv -l | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt
# 9.53M 22:26:06 [ 117 /s]
# Counter({'total': 9510096, 'exists': 8589731, 'insert': 915032, 'skip': 5333, 'inserted.container': 119, 'update': 0})
Wow, almost a million new releases! 7,417,245 results for `doi_registrar:jalc`.
|