From 9f25d84accb5a3657cb4c7dd87014d9f13ccf2ef Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 13 Sep 2018 00:23:09 -0700 Subject: improve dump scripts --- extra/sql_dumps/README.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 extra/sql_dumps/README.md (limited to 'extra/sql_dumps/README.md') diff --git a/extra/sql_dumps/README.md b/extra/sql_dumps/README.md new file mode 100644 index 00000000..6f24207d --- /dev/null +++ b/extra/sql_dumps/README.md @@ -0,0 +1,26 @@ + +## HOWTO: Ident Table Snapshots + +How to take a consistent (single transaction) snapshot of + +This will take somewhere around 15-25 GB of disk space on the database server +(under /tmp). It would probably be better to stream this transaction over a +network connection (saving database disk I/O), but I can't figure out how to do +that with plain SQL (multiple table dumps in a single session), so would need +to be a custom client. + + ./ident_table_snapshot.sh + +## HOWTO: Dump abstracts, release identifiers, file hashes, etc + +These are run as regular old commands, and can run across the network in a +couple different ways. We might not want database ports open to the network +(even cluster/VPN); on the other hand we could proabably do SSH port +forwarding anyways. + + # Locally, or client running on a remote machine + psql fatcat < dump_abstracts.sql | egrep -v ^BEGIN$ | egrep -v ^ROLLBACK$ | pv -l | gzip > abstracts.json.gz + + # Run on database server, write to file on remote host + psql fatcat < dump_abstracts.sql | egrep -v ^BEGIN$ | egrep -v ^ROLLBACK$ | pv -l | gzip | ssh user@host 'cat > abstracts.json.gz' + -- cgit v1.2.3