blob: 69be3253de399ba2cae93f6c9bd1aa4147109b2b (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
## Early Prototyping
### 2018-04-23
- fatcat as marshmallow+sqlalchemy+flask, with API client
- no refs, contibs, files, release contribs, containers, etc
- no extra_json
- sqlite
- laptop
- editgroup every 250 edits
/data/crossref/crossref-works.2018-01-21.badsample_5k.json
real 3m42.912s
user 0m20.448s
sys 0m2.852s
~22 lines per second
12.5 hours per million
~52 days for crossref (100 million)
target:
crossref (100 million) loaded in 48 hours
579 lines per second
this test in under 10 seconds
... but could be in parallel
same except postgres, via:
docker run -p 5432:5432 postgres:latest
./run.py --init-db --database-uri postgres://postgres@localhost:5432
./run.py --database-uri postgres://postgres@localhost:5432
API processing using 60-100% of a core. postgres 12% of a core;
docker-proxy similar (!). overall 70 of system CPU idle.
real 2m27.771s
user 0m22.860s
sys 0m2.852s
no profiling yet; need to look at database ops. probably don't even have any
indices!
|