From 5e2fecf4b81a878ec4321cdd85d6a594e94c1eb2 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Sat, 30 Jun 2018 23:33:03 -0700 Subject: update TODO/notes again --- TODO | 8 +++++--- notes/test_works.txt | 28 ++++++++++++++++++++++++++++ python/README_import.md | 13 +++++++++++++ 3 files changed, 46 insertions(+), 3 deletions(-) diff --git a/TODO b/TODO index a188b88e..591c91b8 100644 --- a/TODO +++ b/TODO @@ -15,12 +15,14 @@ name ref: https://www.w3.org/International/questions/qa-personal-names - full database dump and reload (import/export) - manual editing of containers and releases (web interface) -x bulk loading of releases, files, containers, creators -x accurate auto-matching matching of containers (eg, via ISSN) + +## Web UI + +- changelog more like a https://semantic-ui.com/views/feed.html ? +- instead of grid, maybe https://semantic-ui.com/elements/rail.html ## Performance -x have release creation auto-create works if one isn't specified - write pure-rust "benchmark" scripts that hit, eg, lookups and batch endpoints. run these with auto_explain on, then look in logs on dev machine - batch inserts automerge: create editgroup and changelog, mark all edits as diff --git a/notes/test_works.txt b/notes/test_works.txt index bc6ea64a..286b4d3a 100644 --- a/notes/test_works.txt +++ b/notes/test_works.txt @@ -1,7 +1,35 @@ +## Found because Famous + Many co-authors (group): "Precision measurement of the top-quark mass in lepton+jets final states" https://arxiv.org/abs/1405.1756 +## Found in Testing Imports + +Two releases, same work (actually same release?): + + Freiheit für Nutzer, nicht für Software + 10.14361/transcript.9783839420362.366 + 10.14361/9783839428351-056 + + May also have link via crossref metadata? + +Fun ellen examples: + + Just-in-time databases and the World-Wide Web + 10.1145/288627.288638 + + Two different versions of PDF found, same URL + +Actual ORCID match: + + 10.1002/cfg.158 + 0000-0002-4447-5978 + +Fulltext via CORE publisher-connector: + + 10.1186/s12889-016-2706-9 + diff --git a/python/README_import.md b/python/README_import.md index f43d9424..ae9764e6 100644 --- a/python/README_import.md +++ b/python/README_import.md @@ -99,3 +99,16 @@ From compressed: ## Manifest time ./client.py import-manifest /srv/datasets/idents_files_urls.sqlite + + [...] + Finished a batch; row 284518671 of 9669646 (2942.39%). Total inserted: 6606900 + Finished a batch; row 284518771 of 9669646 (2942.39%). Total inserted: 6606950 + Finished a batch; row 284518845 of 9669646 (2942.39%). Total inserted: 6607000 + Finished a batch; row 284518923 of 9669646 (2942.39%). Total inserted: 6607050 + Done! Inserted 6607075 + + real 1590m36.626s + user 339m40.928s + sys 19m3.576s + +Really sped up once not contending with Crossref import, so don't run these two at the same time. -- cgit v1.2.3