blob: d63c37593fd0aadf9decdc8312c3dde1447e8513 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
2019-10-01 JaLC metadata snapshot: <https://archive.org/download/jalc-bulk-metadata-2019>
Extracted .rdf file instead of piping it through zcat.
Use correct bot:
export FATCAT_AUTH_WORKER_JALC=blah
Start small; do a random bunch (10k) single-threaded to pre-create containers:
head -n100 /srv/fatcat/datasets/JALC-LOD-20191001.rdf | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt
shuf -n100 /srv/fatcat/datasets/JALC-LOD-20191001.rdf | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt
shuf -n10000 /srv/fatcat/datasets/JALC-LOD-20191001.rdf | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt
Seemed like lots of individual containers getting added after repeating, so
just going to import single-threaded to avoid duplicate container creation:
cat /srv/fatcat/datasets/JALC-LOD-20191001.rdf | ./fatcat_import.py --batch-size 100 jalc - /srv/fatcat/datasets/ISSN-to-ISSN-L.txt
=> Counter({'total': 8419745, 'exists': 6480683, 'insert': 1934082, 'skip': 4980, 'inserted.container': 134, 'update': 0})
Had a bit fewer than 4,568,120 "doi_registrar:jalc" releases before this
import, 6,502,202 after (based on `doi_registrar:jalc` query).
|