diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2018-09-09 10:10:42 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-09-09 10:10:42 -0700 |
commit | b15eff77fdb7974ce2bf3c2e44c8edc354f9f452 (patch) | |
tree | 5bccb9ff2633eb35dc00babc0b2dd1842f02e49b /notes/database_dumps_backups.txt | |
parent | 419bddcb0377e82e7177356350d35bf84b3e80d8 (diff) | |
parent | a29beab0683d77086cc1b431779d0540dc5a9b49 (diff) | |
download | fatcat-b15eff77fdb7974ce2bf3c2e44c8edc354f9f452.tar.gz fatcat-b15eff77fdb7974ce2bf3c2e44c8edc354f9f452.zip |
Merge branch 'http-verbs' into cockroach
Manually merged conflicts:
rust/migrations/2018-05-12-001226_init/up.sql
rust/src/api_server.rs
rust/src/database_schema.rs
Diffstat (limited to 'notes/database_dumps_backups.txt')
-rw-r--r-- | notes/database_dumps_backups.txt | 53 |
1 files changed, 53 insertions, 0 deletions
diff --git a/notes/database_dumps_backups.txt b/notes/database_dumps_backups.txt new file mode 100644 index 00000000..60d4bba0 --- /dev/null +++ b/notes/database_dumps_backups.txt @@ -0,0 +1,53 @@ + +## Dumps and Backups + +There are a few different database dump formats folks might want: + +- raw native database backups, for disaster recovery (would include + volatile/unsupported schema details, user API credentials, full history, + in-process edits, comments, etc) +- a sanitized version of the above: roughly per-table dumps of the full state + of the database. Could use per-table SQL expressions with sub-queries to pull + in small tables ("partial transform") and export JSON for each table; would + be extra work to maintain, so not pursuing for now. +- full history, full public schema exports, in a form that might be used to + mirror or enitrely fork the project. Propose supplying the full "changelog" + in API schema format, in a single file to capture all entity history, without + "hydrating" any inter-entity references. Rely on separate dumps of + non-entity, non-versioned tables (editors, abstracts, etc). Note that a + variant of this could use the public interface, in particular to do + incremental updates (though that wouldn't capture schema changes). +- transformed exports of the current state of the database (aka, without + history). Useful for data analysis, search engines, etc. Propose supplying + just the Release table in a fully "hydrated" state to start. Unclear if + should be on a work or release basis; will go with release for now. Harder to + do using public interface because of the need for transaction locking. + +## Full Postgres Backup + +Backing up the entire database using `pg_dump`, with parallelism 1 (use more on +larger machine with fast disks; try 4 or 8?), assuming the database name is +'fatcat', and the current user has access: + + pg_dump -j1 -Fd -f test-dump fatcat + +## Identifier Dumps + +The `extras/quick_dump.sql` script will dump abstracts and identifiers as TSV +files to `/tmp/`. Pretty quick; takes about 15 GB of disk space (uncompressed). + +## Releases Export + + # simple command + ./fatcat_export.py releases /tmp/fatcat_ident_releases.tsv /tmp/releases-dump.json + + # usual command + time ./fatcat_export.py releases /tmp/fatcat_ident_releases.tsv - | pv -l | wc + +## Changelog Export + + # simple command + ./fatcat_export.py changelog /tmp/changelog-dump.json + + # usual command + time ./fatcat_export.py changelog - | pv -l | wc |