diff options
Diffstat (limited to 'notes/performance/speed.txt')
-rw-r--r-- | notes/performance/speed.txt | 82 |
1 files changed, 82 insertions, 0 deletions
diff --git a/notes/performance/speed.txt b/notes/performance/speed.txt new file mode 100644 index 00000000..f885aea7 --- /dev/null +++ b/notes/performance/speed.txt @@ -0,0 +1,82 @@ + +## Early Prototyping + +### 2018-04-23 + +- fatcat as marshmallow+sqlalchemy+flask, with API client +- no refs, contibs, files, release contribs, containers, etc +- no extra_json +- sqlite +- laptop +- editgroup every 250 edits + + + /data/crossref/crossref-works.2018-01-21.badsample_5k.json + + real 3m42.912s + user 0m20.448s + sys 0m2.852s + + ~22 lines per second + 12.5 hours per million + ~52 days for crossref (100 million) + +target: + crossref (100 million) loaded in 48 hours + 579 lines per second + this test in under 10 seconds + ... but could be in parallel + +same except postgres, via: + + docker run -p 5432:5432 postgres:latest + ./run.py --init-db --database-uri postgres://postgres@localhost:5432 + ./run.py --database-uri postgres://postgres@localhost:5432 + + API processing using 60-100% of a core. postgres 12% of a core; + docker-proxy similar (!). overall 70 of system CPU idle. + + real 2m27.771s + user 0m22.860s + sys 0m2.852s + +no profiling yet; need to look at database ops. probably don't even have any +indices! + +## Rust Updates (2018-05-23) + +Re-running with tweaked python code, 5k sample file, postgres 9.6 running locally (not in docker): + + real 2m27.598s + user 0m24.892s + sys 0m2.836s + +Using postgres and fatcat rust: + + real 0m44.443s + user 0m25.288s + sys 0m0.880s + +api_client about half a core; fatcatd 3x processes, about 10% each; postgres +very small. + +a bit faster, basically maxing out CPU: + + time cat /data/crossref/crossref-works.2018-01-21.badsample_5k.json | parallel -j4 --pipe ./fatcat_client.py --host-url http://localhost:9411 ic - + + real 0m28.998s + user 1m5.304s + sys 0m3.420s + + 200 lines per second; within a factor of 3; can perhaps hit target with + non-python client? + +python processes (clients) seem to be CPU limit in this case; all 4 cores +effectively maxed out. + +running python again in parallel mode: + + real 2m29.532s + user 0m47.692s + sys 0m4.840s + |