diff options
| -rw-r--r-- | README.md | 2 | ||||
| -rw-r--r-- | next_thoughts.txt | 19 | ||||
| -rw-r--r-- | notes/bot_tools.txt | 17 | ||||
| -rw-r--r-- | notes/initial_sources.txt | 9 | ||||
| -rw-r--r-- | notes/test_cases.txt | 7 | ||||
| -rw-r--r-- | plan.txt | 3 | 
6 files changed, 55 insertions, 2 deletions
| @@ -20,4 +20,4 @@ Use `pipenv` (which you can install with `pip`).  Run tests: -    pipenv run nosetests3 backend/ webface/ +    pipenv run nosetests3 fatcat diff --git a/next_thoughts.txt b/next_thoughts.txt new file mode 100644 index 00000000..0e89249a --- /dev/null +++ b/next_thoughts.txt @@ -0,0 +1,19 @@ +Should probably just UUID all the (public) ids. + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + diff --git a/notes/bot_tools.txt b/notes/bot_tools.txt new file mode 100644 index 00000000..cf465bde --- /dev/null +++ b/notes/bot_tools.txt @@ -0,0 +1,17 @@ + +Could be helpful for writing bots for import: + +metafacture: large/popular java framework for pipelines and munging library +metadata. + +    https://github.com/metafacture/metafacture-core/wiki + +catmandu: large/popular set of perl libraries for munging bibliographic +metadata, including a DSL ("Fix"). Can also push/pull to backends. + +miku/siskin: luigi and higher-level tool for running regular tasks. + +    https://github.com/miku/span + +miku/span: golang lower-level tools for parsing and normalizing specific +formats (including KBART, DOAJ). diff --git a/notes/initial_sources.txt b/notes/initial_sources.txt index a68fb982..cc22019d 100644 --- a/notes/initial_sources.txt +++ b/notes/initial_sources.txt @@ -9,11 +9,18 @@ then merge in:      dblp      CORE -    oaDOI +    MSAG dump +    VIAF      archive.org paper/url manifest      semantic scholar +    oaDOI  and later: +    wikidata      opencitations      openlibrary + +national libraries: + +    http://www.dnb.de/EN/Service/DigitaleDienste/LinkedData/linkeddata_node.html diff --git a/notes/test_cases.txt b/notes/test_cases.txt new file mode 100644 index 00000000..bc6ea64a --- /dev/null +++ b/notes/test_cases.txt @@ -0,0 +1,7 @@ + +Many co-authors (group): + +    "Precision measurement of the top-quark mass in lepton+jets final states" +    https://arxiv.org/abs/1405.1756 + + @@ -1,4 +1,7 @@ +Avoiding ORM and splitting into two apps seems to be like making water flow up +hill. Going to just make this a generic flask-sqlalchemy thing for now. +  - backend test setup: generate temporary database, insert rows (?)  backend/api: | 
