aboutsummaryrefslogtreecommitdiffstats
path: root/notes/performance/speed.txt
diff options
context:
space:
mode:
Diffstat (limited to 'notes/performance/speed.txt')
-rw-r--r--notes/performance/speed.txt82
1 files changed, 82 insertions, 0 deletions
diff --git a/notes/performance/speed.txt b/notes/performance/speed.txt
new file mode 100644
index 00000000..f885aea7
--- /dev/null
+++ b/notes/performance/speed.txt
@@ -0,0 +1,82 @@
+
+## Early Prototyping
+
+### 2018-04-23
+
+- fatcat as marshmallow+sqlalchemy+flask, with API client
+- no refs, contibs, files, release contribs, containers, etc
+- no extra_json
+- sqlite
+- laptop
+- editgroup every 250 edits
+
+
+ /data/crossref/crossref-works.2018-01-21.badsample_5k.json
+
+ real 3m42.912s
+ user 0m20.448s
+ sys 0m2.852s
+
+ ~22 lines per second
+ 12.5 hours per million
+ ~52 days for crossref (100 million)
+
+target:
+ crossref (100 million) loaded in 48 hours
+ 579 lines per second
+ this test in under 10 seconds
+ ... but could be in parallel
+
+same except postgres, via:
+
+ docker run -p 5432:5432 postgres:latest
+ ./run.py --init-db --database-uri postgres://postgres@localhost:5432
+ ./run.py --database-uri postgres://postgres@localhost:5432
+
+ API processing using 60-100% of a core. postgres 12% of a core;
+ docker-proxy similar (!). overall 70 of system CPU idle.
+
+ real 2m27.771s
+ user 0m22.860s
+ sys 0m2.852s
+
+no profiling yet; need to look at database ops. probably don't even have any
+indices!
+
+## Rust Updates (2018-05-23)
+
+Re-running with tweaked python code, 5k sample file, postgres 9.6 running locally (not in docker):
+
+ real 2m27.598s
+ user 0m24.892s
+ sys 0m2.836s
+
+Using postgres and fatcat rust:
+
+ real 0m44.443s
+ user 0m25.288s
+ sys 0m0.880s
+
+api_client about half a core; fatcatd 3x processes, about 10% each; postgres
+very small.
+
+a bit faster, basically maxing out CPU:
+
+ time cat /data/crossref/crossref-works.2018-01-21.badsample_5k.json | parallel -j4 --pipe ./fatcat_client.py --host-url http://localhost:9411 ic -
+
+ real 0m28.998s
+ user 1m5.304s
+ sys 0m3.420s
+
+ 200 lines per second; within a factor of 3; can perhaps hit target with
+ non-python client?
+
+python processes (clients) seem to be CPU limit in this case; all 4 cores
+effectively maxed out.
+
+running python again in parallel mode:
+
+ real 2m29.532s
+ user 0m47.692s
+ sys 0m4.840s
+