diff options
| author | Bryan Newbold <bnewbold@robocracy.org> | 2018-04-24 16:12:05 -0700 | 
|---|---|---|
| committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-04-24 16:12:05 -0700 | 
| commit | 71a4210f1e27545cadc216301b4529912fc57591 (patch) | |
| tree | 57663c26e32273adf867d263f39cfc2d8595e330 | |
| parent | ecfc0bae97919a88b22145415cb54e3cc170eec2 (diff) | |
| download | fatcat-71a4210f1e27545cadc216301b4529912fc57591.tar.gz fatcat-71a4210f1e27545cadc216301b4529912fc57591.zip | |
backup notes and TODO
| -rw-r--r-- | TODO | 42 | ||||
| -rw-r--r-- | notes/golang.txt | 17 | ||||
| -rw-r--r-- | notes/speed.txt | 44 | 
3 files changed, 78 insertions, 25 deletions
| @@ -1,16 +1,13 @@  routes/views: -- sources and account page as fake links (#) -- "current editgroup" redirect -- per-editor history -- actually wire up work/release creation form +- actually wire up work/release POST form  next/high-level: -- release, container, creator lookups (by external id) -    => creator obj to have ORCID column  - crossref import script: +    => profile both script and API server      => creator/container caching      => edit group +- database index/schema  - ORCID and ISSN import scripts  - client export:      => one json-nl file per entity type @@ -19,24 +16,24 @@ next/high-level:  - naive API-based import scripts for: journals (norwegian), orcid, crossref  - switch to marshmallow in create APIs (at least for revs) +api: +- PUT for mid-edit revisions +- use marshmallow in POST for all entities +- consider refactoring into method-method (classes) +  model:  - 'parent rev' for revisions (vs. container parent) -- helpers to deal with edits and edit groups (?) - -api -- expose edit_group and editor -- work merge helper +- "submit" status for editgroups?  tests -- api gets: creator, container, editgroup +- full object fields actually getting passed e2e (for rich_app)  - implicit editor.active_edit_group behavior  - modify existing release via edit mechanism (and commit) -- merge two releases +- redirect a release to another (merge)  - update (via edit) a redirect release -- merge two works (combining releases)  - api: try to reuse an accepted edit group -- api: try to modify an accepted edit -- api: multiple edits, same entity +- api: try to modify an accepted release +- api: multiple edits, same entity, same editgroup  review  - hydrate in files for releases... nested good enough? @@ -51,19 +48,13 @@ views  - oldest edits/edit-groups  later: -- switch extra_json to just be a column -- extra_json uniqueness -- extra_json marshmallow fixes -- "hydrate" files (and maybe container/authors/refs) in release -- transclude primary_release in work -- crossref json import script/benchmark -    => maybe both "raw" and string-dedupe? -- public IDs are UUID (sqlite hack?) +- switch extra_json to just be columns +- public IDs are UUID (sqlite hack, or just require postgres)  ## High-Level Priorities -- manual editing of containers and works/releases  - bulk loading of releases, files, containers, creators +- manual editing of containers and releases  - accurate auto-matching matching of containers (eg, via ISSN)  - full database dump and reload @@ -76,3 +67,4 @@ later:      - UUID switch      - JSONB/extra_json experiments      - SQL query examples/experiments + diff --git a/notes/golang.txt b/notes/golang.txt new file mode 100644 index 00000000..8527711e --- /dev/null +++ b/notes/golang.txt @@ -0,0 +1,17 @@ + +- pq: basic postgres driver and ORM (similar to sqlalchemy?) +- sqlx: small extensions to builtin sql; row to struct mapping + + +code generation from SQL schema: +- https://github.com/xo/xo +- https://github.com/volatiletech/sqlboiler +- kallax + +database migrations: +- goose +- https://github.com/mattes/migrate + +maybe also: +- https://github.com/oklog/ulid +  like a UUID, but base32 and "sortable" (timestamp + random) diff --git a/notes/speed.txt b/notes/speed.txt new file mode 100644 index 00000000..69be3253 --- /dev/null +++ b/notes/speed.txt @@ -0,0 +1,44 @@ + +## Early Prototyping + +### 2018-04-23 + +- fatcat as marshmallow+sqlalchemy+flask, with API client +- no refs, contibs, files, release contribs, containers, etc +- no extra_json +- sqlite +- laptop +- editgroup every 250 edits + + +    /data/crossref/crossref-works.2018-01-21.badsample_5k.json + +    real    3m42.912s +    user    0m20.448s +    sys     0m2.852s + +    ~22 lines per second +    12.5 hours per million +    ~52 days for crossref (100 million) + +target: +    crossref (100 million) loaded in 48 hours +    579 lines per second +    this test in under 10 seconds +    ... but could be in parallel + +same except postgres, via: + +    docker run -p 5432:5432 postgres:latest +    ./run.py --init-db --database-uri postgres://postgres@localhost:5432 +    ./run.py --database-uri postgres://postgres@localhost:5432 + +    API processing using 60-100% of a core. postgres 12% of a core; +    docker-proxy similar (!). overall 70 of system CPU idle. + +    real    2m27.771s +    user    0m22.860s +    sys     0m2.852s + +no profiling yet; need to look at database ops. probably don't even have any +indices! | 
