aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-12-03 16:37:22 -0800
committerBryan Newbold <bnewbold@archive.org>2021-12-07 19:10:23 -0800
commit9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d (patch)
treebc51b24948082c1fab3e61af1ef162fa9a725f14
parent3f01f73563f40869c82b7ad3e21c4183fdee8206 (diff)
downloadsandcrawler-9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d.tar.gz
sandcrawler-9a32daa502e2c729cf896ae5e7cb27a3aa6bb68d.zip
sandcrawler SQL dump and upload updates
-rw-r--r--sql/README.md16
1 files changed, 12 insertions, 4 deletions
diff --git a/sql/README.md b/sql/README.md
index 1d53d6d..e488006 100644
--- a/sql/README.md
+++ b/sql/README.md
@@ -142,13 +142,21 @@ Questions we might want to answer
## Full SQL Database Dumps
-Run a dump in compressed, postgres custom format:
+Run a dump in compressed, postgres custom format, not including `crossref` table (which is large and redundant):
export DATESLUG="`date +%Y-%m-%d.%H%M%S`"
- time sudo -u postgres pg_dump --verbose --format=custom sandcrawler > sandcrawler_full_dbdump_${DATESLUG}.pgdump
+ time sudo -u postgres pg_dump --verbose --format=custom --exclude-table-data=crossref sandcrawler > sandcrawler_full_dbdump_${DATESLUG}.pgdump
-As of 2021-04-07, this process runs for about 4 hours and the compressed
-snapshot is 88 GBytes (compared with 551.34G database disk consumption).
+As of 2021-12-03, this process runs for about 6 hours and the compressed
+snapshot is 102 GBytes (compared with 940GB database disk consumption,
+including crossref).
+
+Then, upload to petabox as a backup:
+
+ ia upload sandcrawler_full_dbdump_YYYY-MM-DD -m mediatype:data -m collection:webgroup-internal-backups -m title:"Sandcrawler SQL Dump (YYYY-MM-DD)" sandcrawler_full_dbdump_${DATESLUG}.pgdump
+
+
+## SQL Database Restore
To restore a dump (which will delete local database content, if any):