split/move docs around

author: Bryan Newbold <bnewbold@robocracy.org> 2018-08-24 13:29:29 -0700
committer: Bryan Newbold <bnewbold@robocracy.org> 2018-08-24 13:29:29 -0700
commit: f997c5bcbcc800a8780a62dc56a4b7f4e5b68c3c (patch)
tree: 675d45c2b34b95de3475c550d3381f95e372647d /notes/database_dumps_backups.txt
parent: 6a87d4b3ab252d76bb380a69ed53f21989761e9f (diff)
download: fatcat-f997c5bcbcc800a8780a62dc56a4b7f4e5b68c3c.tar.gz
fatcat-f997c5bcbcc800a8780a62dc56a4b7f4e5b68c3c.zip
1 files changed, 31 insertions, 0 deletions
diff --git a/notes/database_dumps_backups.txt b/notes/database_dumps_backups.txt
new file mode 100644
index 00000000..0b05b9b8
--- /dev/null
+++ b/notes/database_dumps_backups.txt
@@ -0,0 +1,31 @@
+
+## Dumps and Backups
+
+There are a few different database dump formats folks might want:
+
+- raw native database backups, for disaster recovery (would include
+  volatile/unsupported schema details, user API credentials, full history,
+  in-process edits, comments, etc)
+- a sanitized version of the above: roughly per-table dumps of the full state
+  of the database. Could use per-table SQL expressions with sub-queries to pull
+  in small tables ("partial transform") and export JSON for each table; would
+  be extra work to maintain, so not pursuing for now.
+- full history, full public schema exports, in a form that might be used to
+  mirror or enitrely fork the project. Propose supplying the full "changelog"
+  in API schema format, in a single file to capture all entity history, without
+  "hydrating" any inter-entity references. Rely on separate dumps of
+  non-entity, non-versioned tables (editors, abstracts, etc). Note that a
+  variant of this could use the public interface, in particular to do
+  incremental updates (though that wouldn't capture schema changes).
+- transformed exports of the current state of the database (aka, without
+  history). Useful for data analysis, search engines, etc. Propose supplying
+  just the Release table in a fully "hydrated" state to start. Unclear if
+  should be on a work or release basis; will go with release for now. Harder to
+  do using public interface because of the need for transaction locking.
+
+Backing up the entire database using `pg_dump`, with parallelism 1 (use more on
+larger machine with fast disks; try 4 or 8?), assuming the database name is
+'fatcat', and the current user has access:
+
+    pg_dump -j1 -Fd -f test-dump fatcat
+
author	Bryan Newbold <bnewbold@robocracy.org>	2018-08-24 13:29:29 -0700
committer	Bryan Newbold <bnewbold@robocracy.org>	2018-08-24 13:29:29 -0700
commit	f997c5bcbcc800a8780a62dc56a4b7f4e5b68c3c (patch)
tree	675d45c2b34b95de3475c550d3381f95e372647d /notes/database_dumps_backups.txt
parent	6a87d4b3ab252d76bb380a69ed53f21989761e9f (diff)
download	fatcat-f997c5bcbcc800a8780a62dc56a4b7f4e5b68c3c.tar.gz fatcat-f997c5bcbcc800a8780a62dc56a4b7f4e5b68c3c.zip