diff options
| author | Bryan Newbold <bnewbold@robocracy.org> | 2020-01-22 11:57:38 -0800 | 
|---|---|---|
| committer | Bryan Newbold <bnewbold@robocracy.org> | 2020-01-22 11:57:38 -0800 | 
| commit | da64fa0b36218d7f9726aa98dff0e834c1845193 (patch) | |
| tree | bc64ca11073914e46a77028e15d6fba4c4e12d35 /proposals | |
| parent | 2a1f93f0a0fa3cbded45bd14a38452b49bd6578c (diff) | |
| download | fatcat-da64fa0b36218d7f9726aa98dff0e834c1845193.tar.gz fatcat-da64fa0b36218d7f9726aa98dff0e834c1845193.zip | |
more details on potential _edit table disk savings
Diffstat (limited to 'proposals')
| -rw-r--r-- | proposals/2020_sql_size_reduction.md | 26 | 
1 files changed, 23 insertions, 3 deletions
| diff --git a/proposals/2020_sql_size_reduction.md b/proposals/2020_sql_size_reduction.md index 8648c19a..f421e455 100644 --- a/proposals/2020_sql_size_reduction.md +++ b/proposals/2020_sql_size_reduction.md @@ -120,14 +120,16 @@ the search index (would, of course, increase that index size).       public | release_ref_target_release_idx | index | postgres | release_ref | 1843 MB |       (2 rows) -Looks like about 13 GByte could be saved here. +Looks like about 13 GByte of index could be saved here, or around 4% of total +release disk utilization.  ## Idea: re-order table columns  I've read that there can be some data storage savings by moving fields which  are often null to the end of table schemas. -Need to research/calculate what savings might be. +Need to research/calculate what savings might be. Probably small, but maybe +significant for edit tables.  Not sure how implementation would work... dump data values (plain or custom  `pg_dump`?) then reload, probably. @@ -151,4 +153,22 @@ Feels like there might be a bunch of waste here. Could make `{editgroup,  ident}` the primary key, which would remove need for a separate identifier and  index (we already have a UNIQ constraint on this). -TODO: investigate further. +    fatcat_prod=# \di+ work* +                                              List of relations +     Schema |                Name                 | Type  |  Owner   |   Table    |  Size   | Desc +    --------+-------------------------------------+-------+----------+------------+---------+----- +     public | work_edit_editgroup_id_ident_id_key | index | postgres | work_edit  | 6674 MB |  +     public | work_edit_ident_idx                 | index | postgres | work_edit  | 4233 MB |  +     public | work_edit_pkey                      | index | postgres | work_edit  | 4233 MB |  +     public | work_ident_pkey                     | index | postgres | work_ident | 4233 MB |  +     public | work_ident_redirect_idx             | index | postgres | work_ident | 3014 MB |  +     public | work_ident_rev_idx                  | index | postgres | work_ident | 4233 MB |  +     public | work_rev_pkey                       | index | postgres | work_rev   | 4233 MB |  +    (7 rows) + +For the example of work edits (13 GB data, 20 GB index, 33 GB total), this +would drop ~20% of data size and ~20% of index size. + +Would it make more sense to use {ident, editgroup} as the primary key and UNIQ, +then have a separate index on `editgroup`? On the assumption that `editgroup` +cardinality is much smaller, thus the index disk usage would be smaller. | 
