diff options
Diffstat (limited to 'notes/bootstrap/import_timing_20190129.txt')
-rw-r--r-- | notes/bootstrap/import_timing_20190129.txt | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/notes/bootstrap/import_timing_20190129.txt b/notes/bootstrap/import_timing_20190129.txt index 6d635f92..30b7bdbf 100644 --- a/notes/bootstrap/import_timing_20190129.txt +++ b/notes/bootstrap/import_timing_20190129.txt @@ -6,6 +6,8 @@ Made a number of changes since yesterday's import, so won't be surprised if run in to problems. Plan is to make any fixes and push through to the end to turn up any additional issues/bugs, then iterate yet again if needed. +NOTE: this import ended up being abandoned (too slow) in lieu of 2019-01-30. + ## Service up/down sudo service fatcat-web stop @@ -108,6 +110,14 @@ up any additional issues/bugs, then iterate yet again if needed. would take... about an hour to restart, might save 20+ hours, might waste 14? + Counter({'total': 5005785, 'insert': 4319312, 'exists': 457819, 'skip': 228654, 'update': 0}) + 531544.60user 13597.32system 60:38:43elapsed 249%CPU (0avgtext+0avgdata 448748maxresident)k + 124037840inputs+395235552outputs (140major+41973732minor)pagefaults 0swaps + + real 3638m43.712s => 60 hours (!!!) + user 8944m37.944s + sys 232m25.200s + export FATCAT_AUTH_SANDCRAWLER="..." export FATCAT_API_AUTH_TOKEN=$FATCAT_AUTH_SANDCRAWLER time zcat /srv/fatcat/datasets/ia_papers_manifest_2018-01-25.matched.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py --batch-size 50 matched --bezerk-mode - |