backup notes and TODO

author: Bryan Newbold <bnewbold@robocracy.org> 2018-04-24 16:12:05 -0700
committer: Bryan Newbold <bnewbold@robocracy.org> 2018-04-24 16:12:05 -0700
commit: 71a4210f1e27545cadc216301b4529912fc57591 (patch)
tree: 57663c26e32273adf867d263f39cfc2d8595e330
parent: ecfc0bae97919a88b22145415cb54e3cc170eec2 (diff)
download: fatcat-71a4210f1e27545cadc216301b4529912fc57591.tar.gz
fatcat-71a4210f1e27545cadc216301b4529912fc57591.zip
3 files changed, 78 insertions, 25 deletions
diff --git a/TODO b/TODO
index 90ef967e..290fc5ab 100644
--- a/TODO
+++ b/TODO
@@ -1,16 +1,13 @@
 
 routes/views:
-- sources and account page as fake links (#)
-- "current editgroup" redirect
-- per-editor history
-- actually wire up work/release creation form
+- actually wire up work/release POST form
 
 next/high-level:
-- release, container, creator lookups (by external id)
-    => creator obj to have ORCID column
 - crossref import script:
+    => profile both script and API server
     => creator/container caching
     => edit group
+- database index/schema
 - ORCID and ISSN import scripts
 - client export:
     => one json-nl file per entity type
@@ -19,24 +16,24 @@ next/high-level:
 - naive API-based import scripts for: journals (norwegian), orcid, crossref
 - switch to marshmallow in create APIs (at least for revs)
 
+api:
+- PUT for mid-edit revisions
+- use marshmallow in POST for all entities
+- consider refactoring into method-method (classes)
+
 model:
 - 'parent rev' for revisions (vs. container parent)
-- helpers to deal with edits and edit groups (?)
-
-api
-- expose edit_group and editor
-- work merge helper
+- "submit" status for editgroups?
 
 tests
-- api gets: creator, container, editgroup
+- full object fields actually getting passed e2e (for rich_app)
 - implicit editor.active_edit_group behavior
 - modify existing release via edit mechanism (and commit)
-- merge two releases
+- redirect a release to another (merge)
 - update (via edit) a redirect release
-- merge two works (combining releases)
 - api: try to reuse an accepted edit group
-- api: try to modify an accepted edit
-- api: multiple edits, same entity
+- api: try to modify an accepted release
+- api: multiple edits, same entity, same editgroup
 
 review
 - hydrate in files for releases... nested good enough?
@@ -51,19 +48,13 @@ views
 - oldest edits/edit-groups
 
 later:
-- switch extra_json to just be a column
-- extra_json uniqueness
-- extra_json marshmallow fixes
-- "hydrate" files (and maybe container/authors/refs) in release
-- transclude primary_release in work
-- crossref json import script/benchmark
-    => maybe both "raw" and string-dedupe?
-- public IDs are UUID (sqlite hack?)
+- switch extra_json to just be columns
+- public IDs are UUID (sqlite hack, or just require postgres)
 
 ## High-Level Priorities
 
-- manual editing of containers and works/releases
 - bulk loading of releases, files, containers, creators
+- manual editing of containers and releases
 - accurate auto-matching matching of containers (eg, via ISSN)
 - full database dump and reload
 
@@ -76,3 +67,4 @@ later:
     - UUID switch
     - JSONB/extra_json experiments
     - SQL query examples/experiments
+
diff --git a/notes/golang.txt b/notes/golang.txt
new file mode 100644
index 00000000..8527711e
--- /dev/null
+++ b/notes/golang.txt
@@ -0,0 +1,17 @@
+
+- pq: basic postgres driver and ORM (similar to sqlalchemy?)
+- sqlx: small extensions to builtin sql; row to struct mapping
+
+
+code generation from SQL schema:
+- https://github.com/xo/xo
+- https://github.com/volatiletech/sqlboiler
+- kallax
+
+database migrations:
+- goose
+- https://github.com/mattes/migrate
+
+maybe also:
+- https://github.com/oklog/ulid
+  like a UUID, but base32 and "sortable" (timestamp + random)
diff --git a/notes/speed.txt b/notes/speed.txt
new file mode 100644
index 00000000..69be3253
--- /dev/null
+++ b/notes/speed.txt
@@ -0,0 +1,44 @@
+
+## Early Prototyping
+
+### 2018-04-23
+
+- fatcat as marshmallow+sqlalchemy+flask, with API client
+- no refs, contibs, files, release contribs, containers, etc
+- no extra_json
+- sqlite
+- laptop
+- editgroup every 250 edits
+
+
+    /data/crossref/crossref-works.2018-01-21.badsample_5k.json
+
+    real    3m42.912s
+    user    0m20.448s
+    sys     0m2.852s
+
+    ~22 lines per second
+    12.5 hours per million
+    ~52 days for crossref (100 million)
+
+target:
+    crossref (100 million) loaded in 48 hours
+    579 lines per second
+    this test in under 10 seconds
+    ... but could be in parallel
+
+same except postgres, via:
+
+    docker run -p 5432:5432 postgres:latest
+    ./run.py --init-db --database-uri postgres://postgres@localhost:5432
+    ./run.py --database-uri postgres://postgres@localhost:5432
+
+    API processing using 60-100% of a core. postgres 12% of a core;
+    docker-proxy similar (!). overall 70 of system CPU idle.
+
+    real    2m27.771s
+    user    0m22.860s
+    sys     0m2.852s
+
+no profiling yet; need to look at database ops. probably don't even have any
+indices!
author	Bryan Newbold <bnewbold@robocracy.org>	2018-04-24 16:12:05 -0700
committer	Bryan Newbold <bnewbold@robocracy.org>	2018-04-24 16:12:05 -0700
commit	71a4210f1e27545cadc216301b4529912fc57591 (patch)
tree	57663c26e32273adf867d263f39cfc2d8595e330
parent	ecfc0bae97919a88b22145415cb54e3cc170eec2 (diff)
download	fatcat-71a4210f1e27545cadc216301b4529912fc57591.tar.gz fatcat-71a4210f1e27545cadc216301b4529912fc57591.zip