summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-03-22 21:31:05 -0700
committerBryan Newbold <bnewbold@robocracy.org>2018-03-22 21:31:05 -0700
commit4ce751f000285bc97adef27bab0873ae2690859e (patch)
tree2b0650d49294b0bedaf20978045df01c1e97b567
parentdaf21f0b80e1783ed1eb777a7b6a9c5618c069d7 (diff)
downloadfatcat-4ce751f000285bc97adef27bab0873ae2690859e.tar.gz
fatcat-4ce751f000285bc97adef27bab0873ae2690859e.zip
bunch of unstructured notes
-rw-r--r--README.md2
-rw-r--r--next_thoughts.txt19
-rw-r--r--notes/bot_tools.txt17
-rw-r--r--notes/initial_sources.txt9
-rw-r--r--notes/test_cases.txt7
-rw-r--r--plan.txt3
6 files changed, 55 insertions, 2 deletions
diff --git a/README.md b/README.md
index 184b6f26..5bea2290 100644
--- a/README.md
+++ b/README.md
@@ -20,4 +20,4 @@ Use `pipenv` (which you can install with `pip`).
Run tests:
- pipenv run nosetests3 backend/ webface/
+ pipenv run nosetests3 fatcat
diff --git a/next_thoughts.txt b/next_thoughts.txt
new file mode 100644
index 00000000..0e89249a
--- /dev/null
+++ b/next_thoughts.txt
@@ -0,0 +1,19 @@
+Should probably just UUID all the (public) ids.
+
+Instead of having a separate id pointer table, could have an extra "mutable"
+public ID column (unique, indexed) on entity rows. Backend would ensure the
+right thing happens. Changelog tables (or special redirect/deletion tables)
+would record changes and be "fallen through" to.
+
+Instead of having merge redirects, could just point all identifiers to the same
+revision (and update them all in the future). Don't need to recurse! Need to
+keep this forever though, could scale badly if "aggregations" get merged.
+
+Redirections of redirections should probably simply be disallowed.
+
+"Deletion" is really just pointing to a special or null entity.
+
+Trade-off: easy querying for common case (wanting "active" rows) vs. robust
+handling of redirects (likely to be pretty common). Also, having UUID handling
+across more than one table.
+
diff --git a/notes/bot_tools.txt b/notes/bot_tools.txt
new file mode 100644
index 00000000..cf465bde
--- /dev/null
+++ b/notes/bot_tools.txt
@@ -0,0 +1,17 @@
+
+Could be helpful for writing bots for import:
+
+metafacture: large/popular java framework for pipelines and munging library
+metadata.
+
+ https://github.com/metafacture/metafacture-core/wiki
+
+catmandu: large/popular set of perl libraries for munging bibliographic
+metadata, including a DSL ("Fix"). Can also push/pull to backends.
+
+miku/siskin: luigi and higher-level tool for running regular tasks.
+
+ https://github.com/miku/span
+
+miku/span: golang lower-level tools for parsing and normalizing specific
+formats (including KBART, DOAJ).
diff --git a/notes/initial_sources.txt b/notes/initial_sources.txt
index a68fb982..cc22019d 100644
--- a/notes/initial_sources.txt
+++ b/notes/initial_sources.txt
@@ -9,11 +9,18 @@ then merge in:
dblp
CORE
- oaDOI
+ MSAG dump
+ VIAF
archive.org paper/url manifest
semantic scholar
+ oaDOI
and later:
+ wikidata
opencitations
openlibrary
+
+national libraries:
+
+ http://www.dnb.de/EN/Service/DigitaleDienste/LinkedData/linkeddata_node.html
diff --git a/notes/test_cases.txt b/notes/test_cases.txt
new file mode 100644
index 00000000..bc6ea64a
--- /dev/null
+++ b/notes/test_cases.txt
@@ -0,0 +1,7 @@
+
+Many co-authors (group):
+
+ "Precision measurement of the top-quark mass in lepton+jets final states"
+ https://arxiv.org/abs/1405.1756
+
+
diff --git a/plan.txt b/plan.txt
index 9e8d957b..33b40663 100644
--- a/plan.txt
+++ b/plan.txt
@@ -1,4 +1,7 @@
+Avoiding ORM and splitting into two apps seems to be like making water flow up
+hill. Going to just make this a generic flask-sqlalchemy thing for now.
+
- backend test setup: generate temporary database, insert rows (?)
backend/api: