aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--TODO42
-rw-r--r--notes/golang.txt17
-rw-r--r--notes/speed.txt44
3 files changed, 78 insertions, 25 deletions
diff --git a/TODO b/TODO
index 90ef967e..290fc5ab 100644
--- a/TODO
+++ b/TODO
@@ -1,16 +1,13 @@
routes/views:
-- sources and account page as fake links (#)
-- "current editgroup" redirect
-- per-editor history
-- actually wire up work/release creation form
+- actually wire up work/release POST form
next/high-level:
-- release, container, creator lookups (by external id)
- => creator obj to have ORCID column
- crossref import script:
+ => profile both script and API server
=> creator/container caching
=> edit group
+- database index/schema
- ORCID and ISSN import scripts
- client export:
=> one json-nl file per entity type
@@ -19,24 +16,24 @@ next/high-level:
- naive API-based import scripts for: journals (norwegian), orcid, crossref
- switch to marshmallow in create APIs (at least for revs)
+api:
+- PUT for mid-edit revisions
+- use marshmallow in POST for all entities
+- consider refactoring into method-method (classes)
+
model:
- 'parent rev' for revisions (vs. container parent)
-- helpers to deal with edits and edit groups (?)
-
-api
-- expose edit_group and editor
-- work merge helper
+- "submit" status for editgroups?
tests
-- api gets: creator, container, editgroup
+- full object fields actually getting passed e2e (for rich_app)
- implicit editor.active_edit_group behavior
- modify existing release via edit mechanism (and commit)
-- merge two releases
+- redirect a release to another (merge)
- update (via edit) a redirect release
-- merge two works (combining releases)
- api: try to reuse an accepted edit group
-- api: try to modify an accepted edit
-- api: multiple edits, same entity
+- api: try to modify an accepted release
+- api: multiple edits, same entity, same editgroup
review
- hydrate in files for releases... nested good enough?
@@ -51,19 +48,13 @@ views
- oldest edits/edit-groups
later:
-- switch extra_json to just be a column
-- extra_json uniqueness
-- extra_json marshmallow fixes
-- "hydrate" files (and maybe container/authors/refs) in release
-- transclude primary_release in work
-- crossref json import script/benchmark
- => maybe both "raw" and string-dedupe?
-- public IDs are UUID (sqlite hack?)
+- switch extra_json to just be columns
+- public IDs are UUID (sqlite hack, or just require postgres)
## High-Level Priorities
-- manual editing of containers and works/releases
- bulk loading of releases, files, containers, creators
+- manual editing of containers and releases
- accurate auto-matching matching of containers (eg, via ISSN)
- full database dump and reload
@@ -76,3 +67,4 @@ later:
- UUID switch
- JSONB/extra_json experiments
- SQL query examples/experiments
+
diff --git a/notes/golang.txt b/notes/golang.txt
new file mode 100644
index 00000000..8527711e
--- /dev/null
+++ b/notes/golang.txt
@@ -0,0 +1,17 @@
+
+- pq: basic postgres driver and ORM (similar to sqlalchemy?)
+- sqlx: small extensions to builtin sql; row to struct mapping
+
+
+code generation from SQL schema:
+- https://github.com/xo/xo
+- https://github.com/volatiletech/sqlboiler
+- kallax
+
+database migrations:
+- goose
+- https://github.com/mattes/migrate
+
+maybe also:
+- https://github.com/oklog/ulid
+ like a UUID, but base32 and "sortable" (timestamp + random)
diff --git a/notes/speed.txt b/notes/speed.txt
new file mode 100644
index 00000000..69be3253
--- /dev/null
+++ b/notes/speed.txt
@@ -0,0 +1,44 @@
+
+## Early Prototyping
+
+### 2018-04-23
+
+- fatcat as marshmallow+sqlalchemy+flask, with API client
+- no refs, contibs, files, release contribs, containers, etc
+- no extra_json
+- sqlite
+- laptop
+- editgroup every 250 edits
+
+
+ /data/crossref/crossref-works.2018-01-21.badsample_5k.json
+
+ real 3m42.912s
+ user 0m20.448s
+ sys 0m2.852s
+
+ ~22 lines per second
+ 12.5 hours per million
+ ~52 days for crossref (100 million)
+
+target:
+ crossref (100 million) loaded in 48 hours
+ 579 lines per second
+ this test in under 10 seconds
+ ... but could be in parallel
+
+same except postgres, via:
+
+ docker run -p 5432:5432 postgres:latest
+ ./run.py --init-db --database-uri postgres://postgres@localhost:5432
+ ./run.py --database-uri postgres://postgres@localhost:5432
+
+ API processing using 60-100% of a core. postgres 12% of a core;
+ docker-proxy similar (!). overall 70 of system CPU idle.
+
+ real 2m27.771s
+ user 0m22.860s
+ sys 0m2.852s
+
+no profiling yet; need to look at database ops. probably don't even have any
+indices!