## Early Prototyping ### 2018-04-23 - fatcat as marshmallow+sqlalchemy+flask, with API client - no refs, contibs, files, release contribs, containers, etc - no extra_json - sqlite - laptop - editgroup every 250 edits /data/crossref/crossref-works.2018-01-21.badsample_5k.json real 3m42.912s user 0m20.448s sys 0m2.852s ~22 lines per second 12.5 hours per million ~52 days for crossref (100 million) target: crossref (100 million) loaded in 48 hours 579 lines per second this test in under 10 seconds ... but could be in parallel same except postgres, via: docker run -p 5432:5432 postgres:latest ./run.py --init-db --database-uri postgres://postgres@localhost:5432 ./run.py --database-uri postgres://postgres@localhost:5432 API processing using 60-100% of a core. postgres 12% of a core; docker-proxy similar (!). overall 70 of system CPU idle. real 2m27.771s user 0m22.860s sys 0m2.852s no profiling yet; need to look at database ops. probably don't even have any indices! ## Rust Updates (2018-05-23) Re-running with tweaked python code, 5k sample file, postgres 9.6 running locally (not in docker): real 2m27.598s user 0m24.892s sys 0m2.836s Using postgres and fatcat rust: real 0m44.443s user 0m25.288s sys 0m0.880s api_client about half a core; fatcatd 3x processes, about 10% each; postgres very small. a bit faster, basically maxing out CPU: time cat /data/crossref/crossref-works.2018-01-21.badsample_5k.json | parallel -j4 --pipe ./fatcat_client.py --host-url http://localhost:9411 ic - real 0m28.998s user 1m5.304s sys 0m3.420s 200 lines per second; within a factor of 3; can perhaps hit target with non-python client? python processes (clients) seem to be CPU limit in this case; all 4 cores effectively maxed out. running python again in parallel mode: real 2m29.532s user 0m47.692s sys 0m4.840s