From 4ce751f000285bc97adef27bab0873ae2690859e Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 22 Mar 2018 21:31:05 -0700 Subject: bunch of unstructured notes --- README.md | 2 +- next_thoughts.txt | 19 +++++++++++++++++++ notes/bot_tools.txt | 17 +++++++++++++++++ notes/initial_sources.txt | 9 ++++++++- notes/test_cases.txt | 7 +++++++ plan.txt | 3 +++ 6 files changed, 55 insertions(+), 2 deletions(-) create mode 100644 next_thoughts.txt create mode 100644 notes/bot_tools.txt create mode 100644 notes/test_cases.txt diff --git a/README.md b/README.md index 184b6f26..5bea2290 100644 --- a/README.md +++ b/README.md @@ -20,4 +20,4 @@ Use `pipenv` (which you can install with `pip`). Run tests: - pipenv run nosetests3 backend/ webface/ + pipenv run nosetests3 fatcat diff --git a/next_thoughts.txt b/next_thoughts.txt new file mode 100644 index 00000000..0e89249a --- /dev/null +++ b/next_thoughts.txt @@ -0,0 +1,19 @@ +Should probably just UUID all the (public) ids. + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + diff --git a/notes/bot_tools.txt b/notes/bot_tools.txt new file mode 100644 index 00000000..cf465bde --- /dev/null +++ b/notes/bot_tools.txt @@ -0,0 +1,17 @@ + +Could be helpful for writing bots for import: + +metafacture: large/popular java framework for pipelines and munging library +metadata. + + https://github.com/metafacture/metafacture-core/wiki + +catmandu: large/popular set of perl libraries for munging bibliographic +metadata, including a DSL ("Fix"). Can also push/pull to backends. + +miku/siskin: luigi and higher-level tool for running regular tasks. + + https://github.com/miku/span + +miku/span: golang lower-level tools for parsing and normalizing specific +formats (including KBART, DOAJ). diff --git a/notes/initial_sources.txt b/notes/initial_sources.txt index a68fb982..cc22019d 100644 --- a/notes/initial_sources.txt +++ b/notes/initial_sources.txt @@ -9,11 +9,18 @@ then merge in: dblp CORE - oaDOI + MSAG dump + VIAF archive.org paper/url manifest semantic scholar + oaDOI and later: + wikidata opencitations openlibrary + +national libraries: + + http://www.dnb.de/EN/Service/DigitaleDienste/LinkedData/linkeddata_node.html diff --git a/notes/test_cases.txt b/notes/test_cases.txt new file mode 100644 index 00000000..bc6ea64a --- /dev/null +++ b/notes/test_cases.txt @@ -0,0 +1,7 @@ + +Many co-authors (group): + + "Precision measurement of the top-quark mass in lepton+jets final states" + https://arxiv.org/abs/1405.1756 + + diff --git a/plan.txt b/plan.txt index 9e8d957b..33b40663 100644 --- a/plan.txt +++ b/plan.txt @@ -1,4 +1,7 @@ +Avoiding ORM and splitting into two apps seems to be like making water flow up +hill. Going to just make this a generic flask-sqlalchemy thing for now. + - backend test setup: generate temporary database, insert rows (?) backend/api: -- cgit v1.2.3