diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2018-03-22 21:31:05 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2018-03-22 21:31:05 -0700 |
commit | 4ce751f000285bc97adef27bab0873ae2690859e (patch) | |
tree | 2b0650d49294b0bedaf20978045df01c1e97b567 | |
parent | daf21f0b80e1783ed1eb777a7b6a9c5618c069d7 (diff) | |
download | fatcat-4ce751f000285bc97adef27bab0873ae2690859e.tar.gz fatcat-4ce751f000285bc97adef27bab0873ae2690859e.zip |
bunch of unstructured notes
-rw-r--r-- | README.md | 2 | ||||
-rw-r--r-- | next_thoughts.txt | 19 | ||||
-rw-r--r-- | notes/bot_tools.txt | 17 | ||||
-rw-r--r-- | notes/initial_sources.txt | 9 | ||||
-rw-r--r-- | notes/test_cases.txt | 7 | ||||
-rw-r--r-- | plan.txt | 3 |
6 files changed, 55 insertions, 2 deletions
@@ -20,4 +20,4 @@ Use `pipenv` (which you can install with `pip`). Run tests: - pipenv run nosetests3 backend/ webface/ + pipenv run nosetests3 fatcat diff --git a/next_thoughts.txt b/next_thoughts.txt new file mode 100644 index 00000000..0e89249a --- /dev/null +++ b/next_thoughts.txt @@ -0,0 +1,19 @@ +Should probably just UUID all the (public) ids. + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + diff --git a/notes/bot_tools.txt b/notes/bot_tools.txt new file mode 100644 index 00000000..cf465bde --- /dev/null +++ b/notes/bot_tools.txt @@ -0,0 +1,17 @@ + +Could be helpful for writing bots for import: + +metafacture: large/popular java framework for pipelines and munging library +metadata. + + https://github.com/metafacture/metafacture-core/wiki + +catmandu: large/popular set of perl libraries for munging bibliographic +metadata, including a DSL ("Fix"). Can also push/pull to backends. + +miku/siskin: luigi and higher-level tool for running regular tasks. + + https://github.com/miku/span + +miku/span: golang lower-level tools for parsing and normalizing specific +formats (including KBART, DOAJ). diff --git a/notes/initial_sources.txt b/notes/initial_sources.txt index a68fb982..cc22019d 100644 --- a/notes/initial_sources.txt +++ b/notes/initial_sources.txt @@ -9,11 +9,18 @@ then merge in: dblp CORE - oaDOI + MSAG dump + VIAF archive.org paper/url manifest semantic scholar + oaDOI and later: + wikidata opencitations openlibrary + +national libraries: + + http://www.dnb.de/EN/Service/DigitaleDienste/LinkedData/linkeddata_node.html diff --git a/notes/test_cases.txt b/notes/test_cases.txt new file mode 100644 index 00000000..bc6ea64a --- /dev/null +++ b/notes/test_cases.txt @@ -0,0 +1,7 @@ + +Many co-authors (group): + + "Precision measurement of the top-quark mass in lepton+jets final states" + https://arxiv.org/abs/1405.1756 + + @@ -1,4 +1,7 @@ +Avoiding ORM and splitting into two apps seems to be like making water flow up +hill. Going to just make this a generic flask-sqlalchemy thing for now. + - backend test setup: generate temporary database, insert rows (?) backend/api: |