blob: cd2a516240ddf23fc1f2a3290abe5e60006809ab (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
|
## Setup
Add to postgres.conf:
shared_preload_libraries = 'auto_explain,pg_stat_statements'
# Increase the max size of the query strings Postgres records
track_activity_query_size = 2048
# Track statements generated by stored procedures as well
pg_stat_statements.track = all
Also:
track_counts (already default)
autovacuum (already default?)
log_min_error = warning
log_min_duration_statement = 5000
Then from shell:
create extension pg_stat_statements;
Regularly want to run:
VACUUM ANALYZE
## Tuning Values
postgres config:
max_connections = 100 (default)
shared_buffers = 128MB -> 10GB (while elastic on same machine; later 16 or more)
effective_cache_size = 4GB -> 24GB (while elastic on same machine)
work_mem = 4MB -> 128MB # relatively few connections/box
fsync = on
commit_delay = ??? (and siblings)
random_page_cost = 1 (for SSD)
default_statistics_target = 100 -> 200
maintenance_work_mem = 64MB -> 8GB
synchronous_commit = off (during dev only! switch to on for production!)
wal_sync_method (keep default)
max_wal_size = 64 -> 128 (based on above HINT message)
# didn't mess with commit_delay/commit_siblings
system:
sysctl -w vm.overcommit_memory=2
TODO: ulimit -n 65536
TODO: ulimit -p 800
LimitNOFILE
/lib/systemd/system/postgresql.service
## Resources
https://www.geekytidbits.com/performance-tuning-postgres/
Could try pgbadger to handle auto_explain type output.
https://www.postgresql.org/docs/10/static/runtime-config-wal.html
IA-specific resources:
https://git.archive.org/ia/mint/blob/master/postgres/postgres_9_2.yml
https://git.archive.org/ia/mint/blob/master/postgres/put_datadir_on_ssd.sh
https://git.archive.org/ia/mint/blob/master/postgres/templates/postgresql.conf.j2
For bulk inserts:
- make write-ahead-log larger (eg, 16MB. done.)
- transactions of ~1000+ inserts
- https://www.postgresql.org/docs/current/static/populate.html
- https://www.depesz.com/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/
- https://stackoverflow.com/questions/12206600/how-to-speed-up-insertion-performance-in-postgresql
## 2018-06-27 Measurements (pre-tuning)
fatcat_prod=# select count(*) from release_ident; 20983019
fatcat_prod=# select count(*) from work_ident; 20988140
fatcat_prod=# select count(*) from file_ident; 1482335
fatcat_prod=# select count(*) from creator_ident; 4167419
fatcat_prod=# select count(*) from container_ident; 61793
select count(*) from release_contrib; 59798133
bnewbold@wbgrp-svc500$ sudo du -sh /var/lib/postgresql/
43G
running import-crossref with 20 threads, and manifest importer with one (at 33%
complete). had already imported ~7million works+releases previously.
PostgreSQL 10.4 - wbgrp-svc500.us.archive.org - postgres@localhost:5432/postgre
Size: 41.38G - 323.40K/s | TPS: 885
Mem.: 50.80% - 23.86G/49.14G | IO Max: 79539/s
Swap: 0.80% - 408.89M/50.00G | Read : 67.04K/s - 16/s
Load: 6.69 7.41 7.69 | Write: 1.93M/s - 493/s
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
32 6 62 0 0 0| 296k 3880k| 334k 3144B| 0 0 | 21k 65k
31 6 62 0 0 0| 0 3072k| 391k 318B| 0 0 | 51k 141k
31 6 63 0 0 0| 16k 1080k| 344k 1988B| 0 0 | 35k 104k
29 6 65 0 0 0| 136k 2608k| 175k 332B| 0 0 |9835 15k
28 5 67 0 0 0| 408k 4368k| 285k 832B| 0 0 | 14k 17k
33 5 62 0 0 0| 56k 3256k| 219k 99B| 0 0 | 22k 49k
31 6 63 0 0 0| 188k 5120k| 158k 318B| 0 0 | 17k 29k
30 6 64 0 0 0| 200k 6984k| 239k 988B| 0 0 | 16k 24k
30 6 64 0 0 0| 168k 5504k| 159k 152B| 0 0 | 14k 20k
28 7 65 0 0 0| 440k 12M| 236k 420B| 0 0 | 15k 18k
29 6 65 0 0 0| 428k 6968k| 352k 310B| 0 0 | 19k 31k
32 6 62 0 0 0| 64k 3480k| 288k 318B| 0 0 | 18k 55k
32 6 62 0 0 0| 32k 2080k| 155k 318B| 0 0 | 20k 52k
bnewbold@wbgrp-svc500$ uptime
22:00:42 up 28 days, 22:31, 6 users, load average: 7.94, 7.56, 7.72
2018-06-27 21:57:36.102 UTC [401] LOG: checkpoints are occurring too frequently (13 seconds apart)
2018-06-27 21:57:36.102 UTC [401] HINT: Consider increasing the configuration parameter "max_wal_size".
relname | too_much_seq | case | rel_size | seq_scan | idx_scan
-----------------+--------------+----------------+-------------+----------+-----------
changelog | 1274670 | Missing Index? | 39411712 | 1274670 | 0
file_edit | 612386 | Missing Index? | 108298240 | 612386 | 0
creator_edit | 612386 | Missing Index? | 285540352 | 612386 | 0
container_edit | 612386 | Missing Index? | 4784128 | 612386 | 0
release_edit | 612386 | Missing Index? | 1454489600 | 612386 | 0
work_edit | 612386 | Missing Index? | 1454415872 | 612386 | 0
release_contrib | 296675 | Missing Index? | 4725645312 | 296675 | 0
release_ref | 296663 | Missing Index? | 8999837696 | 296663 | 0
file_release | -113 | OK | 13918208 | 110 | 223
container_rev | -979326 | OK | 16285696 | 63 | 979389
file_ident | -3671516 | OK | 109002752 | 362 | 3671878
file_rev | -3944155 | OK | 302940160 | 95 | 3944250
creator_rev | -8420205 | OK | 318283776 | 1226 | 8421431
creator_ident | -9525338 | OK | 309141504 | 52330 | 9577668
container_ident | -20581876 | OK | 4833280 | 272457 | 20854333
release_ident | -40548858 | OK | 1440948224 | 4160919 | 44709777
work_rev | -42534913 | OK | 1124671488 | 1161 | 42536074
editgroup | -48864662 | OK | 34136064 | 1 | 48864663
work_ident | -65008911 | OK | 1503313920 | 1239 | 65010150
release_rev | -185735794 | OK | 13649428480 | 128 | 185735922
## 2018-06-28 (after basic tuning + indexes)
Early loading (manifest and 20x release):
PostgreSQL 10.4 - wbgrp-svc500.us.archive.org - postgres@localhost:5432/postgres - Ref.: 2s
Size: 4.57G - 6.45M/s | TPS: 18812
Mem.: 59.70% - 23.62G/49.14G | IO Max: 3601/s
Swap: 1.30% - 675.05M/50.00G | Read : 0.00B/s - 0/s
Load: 12.98 10.58 5.25 | Write: 2.65M/s - 677/s
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24045 webcrawl 20 0 908872 204948 11756 S 153.3 0.4 16:13.03 fatcatd
24328 webcrawl 20 0 78148 45220 4324 R 87.1 0.1 8:44.16 perl
24056 postgres 20 0 10.441g 3.906g 3.886g R 69.9 7.9 6:57.47 postgres
24063 postgres 20 0 10.447g 3.899g 3.873g S 67.9 7.9 6:55.89 postgres
24059 postgres 20 0 10.426g 3.888g 3.883g R 67.5 7.9 6:59.15 postgres
24057 postgres 20 0 10.430g 3.883g 3.874g S 67.2 7.9 6:58.68 postgres
24061 postgres 20 0 10.448g 3.909g 3.881g R 66.2 8.0 6:54.30 postgres
24058 postgres 20 0 10.428g 3.883g 3.876g R 65.9 7.9 6:59.35 postgres
24062 postgres 20 0 10.426g 5.516g 5.511g R 64.9 11.2 6:58.29 postgres
24055 postgres 20 0 10.426g 3.878g 3.873g R 64.2 7.9 6:59.38 postgres
24054 postgres 20 0 10.430g 5.499g 5.491g R 63.6 11.2 6:57.27 postgres
24060 postgres 20 0 10.448g 3.900g 3.873g R 61.9 7.9 6:55.45 postgres
21711 postgres 20 0 10.419g 5.762g 5.760g D 16.6 11.7 3:00.67 postgres
21713 postgres 20 0 10.419g 21432 19512 S 11.3 0.0 3:25.11 postgres
24392 webcrawl 20 0 5309636 400912 8696 S 7.9 0.8 0:53.18 python3
24383 webcrawl 20 0 5309436 400628 8648 S 7.6 0.8 0:52.29 python3
24387 webcrawl 20 0 5309776 402968 8620 S 7.3 0.8 0:52.81 python3
24394 webcrawl 20 0 5309624 400732 8644 S 7.3 0.8 0:53.30 python3
24384 webcrawl 20 0 5309916 400948 8600 S 7.0 0.8 0:53.18 python3
Still get a *lot* of:
2018-06-29 00:14:05.948 UTC [21711] LOG: checkpoints are occurring too frequently (1 second apart)
2018-06-29 00:14:05.948 UTC [21711] HINT: Consider increasing the configuration parameter "max_wal_size".
VACUUM is running basically continuously; should prevent that? 6 hours or
longer on release_rev and release_ref tables. An auto-approve batch method
would resovle this, I think (no update after insert).
max_wal_size wasn't getting set correctly.
The statements taking the most time are the complex inserts (multi-table
inserts); they take a fraction of a second though (mean less than a
milisecond).
Manifest import runs really slow if release import is concurrent; much faster
to wait until release import is done first (like a factor of 10x or more).
With some 60 million releases:
bnewbold@wbgrp-svc500$ sudo du -sh /var/lib/postgresql/
184G /var/lib/postgresql/
TODO: slow query log doesn't seem to be working (let alone auto_explain)
|