summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-06-30 23:33:03 -0700
committerBryan Newbold <bnewbold@robocracy.org>2018-06-30 23:33:03 -0700
commit5e2fecf4b81a878ec4321cdd85d6a594e94c1eb2 (patch)
treeba0b0affbff2f1c41e22a9ec0a548e8e80b06bb3
parent224e4929af6fa84401944c757d9902a5f83da7a3 (diff)
downloadfatcat-5e2fecf4b81a878ec4321cdd85d6a594e94c1eb2.tar.gz
fatcat-5e2fecf4b81a878ec4321cdd85d6a594e94c1eb2.zip
update TODO/notes again
-rw-r--r--TODO8
-rw-r--r--notes/test_works.txt28
-rw-r--r--python/README_import.md13
3 files changed, 46 insertions, 3 deletions
diff --git a/TODO b/TODO
index a188b88e..591c91b8 100644
--- a/TODO
+++ b/TODO
@@ -15,12 +15,14 @@ name ref: https://www.w3.org/International/questions/qa-personal-names
- full database dump and reload (import/export)
- manual editing of containers and releases (web interface)
-x bulk loading of releases, files, containers, creators
-x accurate auto-matching matching of containers (eg, via ISSN)
+
+## Web UI
+
+- changelog more like a https://semantic-ui.com/views/feed.html ?
+- instead of grid, maybe https://semantic-ui.com/elements/rail.html
## Performance
-x have release creation auto-create works if one isn't specified
- write pure-rust "benchmark" scripts that hit, eg, lookups and batch
endpoints. run these with auto_explain on, then look in logs on dev machine
- batch inserts automerge: create editgroup and changelog, mark all edits as
diff --git a/notes/test_works.txt b/notes/test_works.txt
index bc6ea64a..286b4d3a 100644
--- a/notes/test_works.txt
+++ b/notes/test_works.txt
@@ -1,7 +1,35 @@
+## Found because Famous
+
Many co-authors (group):
"Precision measurement of the top-quark mass in lepton+jets final states"
https://arxiv.org/abs/1405.1756
+## Found in Testing Imports
+
+Two releases, same work (actually same release?):
+
+ Freiheit für Nutzer, nicht für Software
+ 10.14361/transcript.9783839420362.366
+ 10.14361/9783839428351-056
+
+ May also have link via crossref metadata?
+
+Fun ellen examples:
+
+ Just-in-time databases and the World-Wide Web
+ 10.1145/288627.288638
+
+ Two different versions of PDF found, same URL
+
+Actual ORCID match:
+
+ 10.1002/cfg.158
+ 0000-0002-4447-5978
+
+Fulltext via CORE publisher-connector:
+
+ 10.1186/s12889-016-2706-9
+
diff --git a/python/README_import.md b/python/README_import.md
index f43d9424..ae9764e6 100644
--- a/python/README_import.md
+++ b/python/README_import.md
@@ -99,3 +99,16 @@ From compressed:
## Manifest
time ./client.py import-manifest /srv/datasets/idents_files_urls.sqlite
+
+ [...]
+ Finished a batch; row 284518671 of 9669646 (2942.39%). Total inserted: 6606900
+ Finished a batch; row 284518771 of 9669646 (2942.39%). Total inserted: 6606950
+ Finished a batch; row 284518845 of 9669646 (2942.39%). Total inserted: 6607000
+ Finished a batch; row 284518923 of 9669646 (2942.39%). Total inserted: 6607050
+ Done! Inserted 6607075
+
+ real 1590m36.626s
+ user 339m40.928s
+ sys 19m3.576s
+
+Really sped up once not contending with Crossref import, so don't run these two at the same time.