diff options
Diffstat (limited to 'notes/speed.txt')
-rw-r--r-- | notes/speed.txt | 82 |
1 files changed, 0 insertions, 82 deletions
diff --git a/notes/speed.txt b/notes/speed.txt deleted file mode 100644 index f885aea7..00000000 --- a/notes/speed.txt +++ /dev/null @@ -1,82 +0,0 @@ - -## Early Prototyping - -### 2018-04-23 - -- fatcat as marshmallow+sqlalchemy+flask, with API client -- no refs, contibs, files, release contribs, containers, etc -- no extra_json -- sqlite -- laptop -- editgroup every 250 edits - - - /data/crossref/crossref-works.2018-01-21.badsample_5k.json - - real 3m42.912s - user 0m20.448s - sys 0m2.852s - - ~22 lines per second - 12.5 hours per million - ~52 days for crossref (100 million) - -target: - crossref (100 million) loaded in 48 hours - 579 lines per second - this test in under 10 seconds - ... but could be in parallel - -same except postgres, via: - - docker run -p 5432:5432 postgres:latest - ./run.py --init-db --database-uri postgres://postgres@localhost:5432 - ./run.py --database-uri postgres://postgres@localhost:5432 - - API processing using 60-100% of a core. postgres 12% of a core; - docker-proxy similar (!). overall 70 of system CPU idle. - - real 2m27.771s - user 0m22.860s - sys 0m2.852s - -no profiling yet; need to look at database ops. probably don't even have any -indices! - -## Rust Updates (2018-05-23) - -Re-running with tweaked python code, 5k sample file, postgres 9.6 running locally (not in docker): - - real 2m27.598s - user 0m24.892s - sys 0m2.836s - -Using postgres and fatcat rust: - - real 0m44.443s - user 0m25.288s - sys 0m0.880s - -api_client about half a core; fatcatd 3x processes, about 10% each; postgres -very small. - -a bit faster, basically maxing out CPU: - - time cat /data/crossref/crossref-works.2018-01-21.badsample_5k.json | parallel -j4 --pipe ./fatcat_client.py --host-url http://localhost:9411 ic - - - real 0m28.998s - user 1m5.304s - sys 0m3.420s - - 200 lines per second; within a factor of 3; can perhaps hit target with - non-python client? - -python processes (clients) seem to be CPU limit in this case; all 4 cores -effectively maxed out. - -running python again in parallel mode: - - real 2m29.532s - user 0m47.692s - sys 0m4.840s - |