aboutsummaryrefslogtreecommitdiffstats
path: root/notes/cockroach_tuning.md
blob: dc6acb4fc027dd75c702af96b3ec8354f8492dba (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

When doing a bulk import, could try:


Probably want to set:

    --cache=.25 --max-sql-memory=.25

Check FD limits:

    cat /proc/sys/fs/file-max
    ulimit -a

Set with:

    echo 150000 > /proc/sys/fs/file-max

TODO:
- set `num_replicas=1` when testing
- close browser to free up RAM
- disable raft fsync
    https://github.com/cockroachdb/cockroach/issues/19784
- increase import batch size (10x current)

## Log of Tests

build: CCL v2.0.5 @ 2018/08/13 17:59:42 (go1.10)

COCKROACH branch e386022ba051f46ff70c28b90d9a2fe1f856c57f

cockroach start --insecure --store=fatcat-dev --host=localhost

    time ./fatcat_import.py import-issn /home/bnewbold/code/oa-journals-analysis/upload-2018-04-05/journal_extra_metadata.csv

    real    3m20.781s
    user    0m6.900s
    sys     0m0.336s

    cat /data/crossref/crossref-works.2018-01-21.badsample_100k.json | time parallel -j4 --round-robin --pipe ./fatcat_import.py import-crossref - /data/issn/20180216.ISSN-to-ISSN-L.txt

    => gave up after ~30 minutes with only ~40% done

With cockroach, container lookups seem to be particularly slow, like 1000ms+,
though not reflected in cockroach-reported stats . Very high CPU usage by
cockroach.

Cockroach is self-reporting P50 latency: 0.3 ms, P99 latency: 104.9 ms

Seeing lots of Queue Processing Failures, of the "replication" type. Probably because there are no other nodes.

CHANGES:
- closed browser, restarted postgres, restarted fatcatd to free up RAM
- cockroach start --insecure --store=fatcat-dev --host=localhost --cache=.25 --max-sql-memory=.25
- cockroach zone set --insecure .default -f -
    num_replicas: 1
    ^D
- SET CLUSTER SETTING kv.raft_log.synchronize=false;
- batch size from default (50) to 500

    time ./fatcat_import.py import-issn --batch-size 500 /home/bnewbold/code/oa-journals-analysis/upload-2018-04-05/journal_extra_metadata.csv

    real    1m10.531s
    user    0m5.204s
    sys     0m0.128s

Have 12 SQL connections

    cat /data/crossref/crossref-works.2018-01-21.badsample_100k.json | time parallel -j4 --round-robin --pipe ./fatcat_import.py import-crossref --batch-size 500 - /data/issn/20180216.ISSN-to-ISSN-L.txt

    GETs are still very slow end-to-end (~1sec)
    during this time, though, cockroach is quoting very low latency
    => halted

It's the join performance (on lookups) that is killing things for release rev import:

    select * from container_rev where issnl = '0010-7824';
    Time: 2.16272ms

    select * from container_rev inner join container_ident on container_rev.id = container_ident.rev_id where issnl = '0010-7824';
    Time: 147.44647ms

CHANGED: CCL v2.1.0-beta.20180904 @ 2018/09/04 03:51:03 (go1.10.3)

    time ./fatcat_import.py import-issn --batch-size 500 /home/bnewbold/code/oa-journals-analysis/upload-2018-04-05/journal_extra_metadata.csv

    real    0m59.903s
    user    0m4.716s
    sys     0m0.076s

    cat /data/crossref/crossref-works.2018-01-21.badsample_100k.json | time parallel -j4 --round-robin --pipe ./fatcat_import.py import-crossref --batch-size 500 - /data/issn/20180216.ISSN-to-ISSN-L.txt

    => still extremely slow API latency, though cockroach is reporting very
       fast (P50 latency 1.0 ms, P99 latency 4.7 ms)
    => maybe a timeout or something going on?
    => single-level joins from CLI are ~200 ms