summaryrefslogtreecommitdiffstats
path: root/notes/import_timing_20180923.txt
blob: f8814f3dfdba66b9fec6c203612f32cd5c449daf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

    105595.18user 3903.65system 15:59:39elapsed 190%CPU (0avgtext+0avgdata 458836maxresident)k
    71022792inputs+327828472outputs (176major+31149593minor)pagefaults 0swaps

    real    959m39.521s
    user    1845m10.392s
    sys     70m33.780s

Did I get the same error again? I'm confused:

    HTTP response body: {"message":"number of parameters must be between 0 and 65535\n"}
    (but not in all threads)

Yes, ugh, because 50*2500 can be over (it's not just individual large releases,
they come in big batches).

But:

    select count(id) from release_ident; => 70006121

A lot, though not 72 million like last time, hrm. I'm... going to move ahead I
guess.

"Processed 4440850 lines, inserted 3509600, updated 0."
    => implies 79029915 records

    time zcat /srv/fatcat/datasets/ia_papers_manifest_2018-01-25.matched.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py import-matched --no-file-update -
    Processed 530750 lines, inserted 435239, updated 0. (etc)
    Command exited with non-zero status 1
    15121.47user 676.49system 2:23:52elapsed 183%CPU (0avgtext+0avgdata 70076maxresident)k
    127760inputs+3477184outputs (116major+475489minor)pagefaults 0swaps

    real    143m52.681s
    user    252m31.620s
    sys     11m21.608s

    zcat /srv/fatcat/datasets/2018-08-27-2352.17-matchcrossref.insertable.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py import-matched -

    (running...)