aboutsummaryrefslogtreecommitdiffstats
path: root/notes/bootstrap/import_timing_20190129.txt
diff options
context:
space:
mode:
Diffstat (limited to 'notes/bootstrap/import_timing_20190129.txt')
-rw-r--r--notes/bootstrap/import_timing_20190129.txt10
1 files changed, 10 insertions, 0 deletions
diff --git a/notes/bootstrap/import_timing_20190129.txt b/notes/bootstrap/import_timing_20190129.txt
index 6d635f92..30b7bdbf 100644
--- a/notes/bootstrap/import_timing_20190129.txt
+++ b/notes/bootstrap/import_timing_20190129.txt
@@ -6,6 +6,8 @@ Made a number of changes since yesterday's import, so won't be surprised if run
in to problems. Plan is to make any fixes and push through to the end to turn
up any additional issues/bugs, then iterate yet again if needed.
+NOTE: this import ended up being abandoned (too slow) in lieu of 2019-01-30.
+
## Service up/down
sudo service fatcat-web stop
@@ -108,6 +110,14 @@ up any additional issues/bugs, then iterate yet again if needed.
would take... about an hour to restart, might save 20+ hours, might waste 14?
+ Counter({'total': 5005785, 'insert': 4319312, 'exists': 457819, 'skip': 228654, 'update': 0})
+ 531544.60user 13597.32system 60:38:43elapsed 249%CPU (0avgtext+0avgdata 448748maxresident)k
+ 124037840inputs+395235552outputs (140major+41973732minor)pagefaults 0swaps
+
+ real 3638m43.712s => 60 hours (!!!)
+ user 8944m37.944s
+ sys 232m25.200s
+
export FATCAT_AUTH_SANDCRAWLER="..."
export FATCAT_API_AUTH_TOKEN=$FATCAT_AUTH_SANDCRAWLER
time zcat /srv/fatcat/datasets/ia_papers_manifest_2018-01-25.matched.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py --batch-size 50 matched --bezerk-mode -