diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2020-01-22 11:57:38 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2020-01-22 11:57:38 -0800 |
commit | da64fa0b36218d7f9726aa98dff0e834c1845193 (patch) | |
tree | bc64ca11073914e46a77028e15d6fba4c4e12d35 | |
parent | 2a1f93f0a0fa3cbded45bd14a38452b49bd6578c (diff) | |
download | fatcat-da64fa0b36218d7f9726aa98dff0e834c1845193.tar.gz fatcat-da64fa0b36218d7f9726aa98dff0e834c1845193.zip |
more details on potential _edit table disk savings
-rw-r--r-- | proposals/2020_sql_size_reduction.md | 26 |
1 files changed, 23 insertions, 3 deletions
diff --git a/proposals/2020_sql_size_reduction.md b/proposals/2020_sql_size_reduction.md index 8648c19a..f421e455 100644 --- a/proposals/2020_sql_size_reduction.md +++ b/proposals/2020_sql_size_reduction.md @@ -120,14 +120,16 @@ the search index (would, of course, increase that index size). public | release_ref_target_release_idx | index | postgres | release_ref | 1843 MB | (2 rows) -Looks like about 13 GByte could be saved here. +Looks like about 13 GByte of index could be saved here, or around 4% of total +release disk utilization. ## Idea: re-order table columns I've read that there can be some data storage savings by moving fields which are often null to the end of table schemas. -Need to research/calculate what savings might be. +Need to research/calculate what savings might be. Probably small, but maybe +significant for edit tables. Not sure how implementation would work... dump data values (plain or custom `pg_dump`?) then reload, probably. @@ -151,4 +153,22 @@ Feels like there might be a bunch of waste here. Could make `{editgroup, ident}` the primary key, which would remove need for a separate identifier and index (we already have a UNIQ constraint on this). -TODO: investigate further. + fatcat_prod=# \di+ work* + List of relations + Schema | Name | Type | Owner | Table | Size | Desc + --------+-------------------------------------+-------+----------+------------+---------+----- + public | work_edit_editgroup_id_ident_id_key | index | postgres | work_edit | 6674 MB | + public | work_edit_ident_idx | index | postgres | work_edit | 4233 MB | + public | work_edit_pkey | index | postgres | work_edit | 4233 MB | + public | work_ident_pkey | index | postgres | work_ident | 4233 MB | + public | work_ident_redirect_idx | index | postgres | work_ident | 3014 MB | + public | work_ident_rev_idx | index | postgres | work_ident | 4233 MB | + public | work_rev_pkey | index | postgres | work_rev | 4233 MB | + (7 rows) + +For the example of work edits (13 GB data, 20 GB index, 33 GB total), this +would drop ~20% of data size and ~20% of index size. + +Would it make more sense to use {ident, editgroup} as the primary key and UNIQ, +then have a separate index on `editgroup`? On the assumption that `editgroup` +cardinality is much smaller, thus the index disk usage would be smaller. |