summaryrefslogtreecommitdiffstats
path: root/notes/speed.txt
diff options
context:
space:
mode:
Diffstat (limited to 'notes/speed.txt')
-rw-r--r--notes/speed.txt82
1 files changed, 0 insertions, 82 deletions
diff --git a/notes/speed.txt b/notes/speed.txt
deleted file mode 100644
index f885aea7..00000000
--- a/notes/speed.txt
+++ /dev/null
@@ -1,82 +0,0 @@
-
-## Early Prototyping
-
-### 2018-04-23
-
-- fatcat as marshmallow+sqlalchemy+flask, with API client
-- no refs, contibs, files, release contribs, containers, etc
-- no extra_json
-- sqlite
-- laptop
-- editgroup every 250 edits
-
-
- /data/crossref/crossref-works.2018-01-21.badsample_5k.json
-
- real 3m42.912s
- user 0m20.448s
- sys 0m2.852s
-
- ~22 lines per second
- 12.5 hours per million
- ~52 days for crossref (100 million)
-
-target:
- crossref (100 million) loaded in 48 hours
- 579 lines per second
- this test in under 10 seconds
- ... but could be in parallel
-
-same except postgres, via:
-
- docker run -p 5432:5432 postgres:latest
- ./run.py --init-db --database-uri postgres://postgres@localhost:5432
- ./run.py --database-uri postgres://postgres@localhost:5432
-
- API processing using 60-100% of a core. postgres 12% of a core;
- docker-proxy similar (!). overall 70 of system CPU idle.
-
- real 2m27.771s
- user 0m22.860s
- sys 0m2.852s
-
-no profiling yet; need to look at database ops. probably don't even have any
-indices!
-
-## Rust Updates (2018-05-23)
-
-Re-running with tweaked python code, 5k sample file, postgres 9.6 running locally (not in docker):
-
- real 2m27.598s
- user 0m24.892s
- sys 0m2.836s
-
-Using postgres and fatcat rust:
-
- real 0m44.443s
- user 0m25.288s
- sys 0m0.880s
-
-api_client about half a core; fatcatd 3x processes, about 10% each; postgres
-very small.
-
-a bit faster, basically maxing out CPU:
-
- time cat /data/crossref/crossref-works.2018-01-21.badsample_5k.json | parallel -j4 --pipe ./fatcat_client.py --host-url http://localhost:9411 ic -
-
- real 0m28.998s
- user 1m5.304s
- sys 0m3.420s
-
- 200 lines per second; within a factor of 3; can perhaps hit target with
- non-python client?
-
-python processes (clients) seem to be CPU limit in this case; all 4 cores
-effectively maxed out.
-
-running python again in parallel mode:
-
- real 2m29.532s
- user 0m47.692s
- sys 0m4.840s
-