aboutsummaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2020-01-22 11:57:38 -0800
committerBryan Newbold <bnewbold@robocracy.org>2020-01-22 11:57:38 -0800
commitda64fa0b36218d7f9726aa98dff0e834c1845193 (patch)
treebc64ca11073914e46a77028e15d6fba4c4e12d35 /proposals
parent2a1f93f0a0fa3cbded45bd14a38452b49bd6578c (diff)
downloadfatcat-da64fa0b36218d7f9726aa98dff0e834c1845193.tar.gz
fatcat-da64fa0b36218d7f9726aa98dff0e834c1845193.zip
more details on potential _edit table disk savings
Diffstat (limited to 'proposals')
-rw-r--r--proposals/2020_sql_size_reduction.md26
1 files changed, 23 insertions, 3 deletions
diff --git a/proposals/2020_sql_size_reduction.md b/proposals/2020_sql_size_reduction.md
index 8648c19a..f421e455 100644
--- a/proposals/2020_sql_size_reduction.md
+++ b/proposals/2020_sql_size_reduction.md
@@ -120,14 +120,16 @@ the search index (would, of course, increase that index size).
public | release_ref_target_release_idx | index | postgres | release_ref | 1843 MB |
(2 rows)
-Looks like about 13 GByte could be saved here.
+Looks like about 13 GByte of index could be saved here, or around 4% of total
+release disk utilization.
## Idea: re-order table columns
I've read that there can be some data storage savings by moving fields which
are often null to the end of table schemas.
-Need to research/calculate what savings might be.
+Need to research/calculate what savings might be. Probably small, but maybe
+significant for edit tables.
Not sure how implementation would work... dump data values (plain or custom
`pg_dump`?) then reload, probably.
@@ -151,4 +153,22 @@ Feels like there might be a bunch of waste here. Could make `{editgroup,
ident}` the primary key, which would remove need for a separate identifier and
index (we already have a UNIQ constraint on this).
-TODO: investigate further.
+ fatcat_prod=# \di+ work*
+ List of relations
+ Schema | Name | Type | Owner | Table | Size | Desc
+ --------+-------------------------------------+-------+----------+------------+---------+-----
+ public | work_edit_editgroup_id_ident_id_key | index | postgres | work_edit | 6674 MB |
+ public | work_edit_ident_idx | index | postgres | work_edit | 4233 MB |
+ public | work_edit_pkey | index | postgres | work_edit | 4233 MB |
+ public | work_ident_pkey | index | postgres | work_ident | 4233 MB |
+ public | work_ident_redirect_idx | index | postgres | work_ident | 3014 MB |
+ public | work_ident_rev_idx | index | postgres | work_ident | 4233 MB |
+ public | work_rev_pkey | index | postgres | work_rev | 4233 MB |
+ (7 rows)
+
+For the example of work edits (13 GB data, 20 GB index, 33 GB total), this
+would drop ~20% of data size and ~20% of index size.
+
+Would it make more sense to use {ident, editgroup} as the primary key and UNIQ,
+then have a separate index on `editgroup`? On the assumption that `editgroup`
+cardinality is much smaller, thus the index disk usage would be smaller.