aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-04-03 15:14:46 -0700
committerBryan Newbold <bnewbold@archive.org>2020-04-03 15:14:46 -0700
commit2bdda2dbf8204d0dd36a4b5b7460ff89bfcc3b5c (patch)
tree546ebabc8865b1b4c6ab36aa1788a23943f66079
parent4ff398fbd04c57444680904d1059d916bd2d2fcc (diff)
downloadfatcat-covid19-2bdda2dbf8204d0dd36a4b5b7460ff89bfcc3b5c.tar.gz
fatcat-covid19-2bdda2dbf8204d0dd36a4b5b7460ff89bfcc3b5c.zip
update plan file
-rw-r--r--plan.txt15
1 files changed, 14 insertions, 1 deletions
diff --git a/plan.txt b/plan.txt
index 80c934d..0f97305 100644
--- a/plan.txt
+++ b/plan.txt
@@ -4,6 +4,19 @@ layout:
- python code/libs in sub-directory
- single-file flask with all routes, call helper routines
+prototype pipeline:
+- CORD-19 dataset
+- enrich script fetches fatcat metadata, outputs combined .json
+- download + derive manually
+- transform script (based on download) creates ES documents as JSON
+
+pipeline:
+- .json files with basic metadata from each source
+ => CORD-19
+ => fatcat ES queries
+ => manual addition
+- enrich script takes all the above, does fatcat lookups, de-dupes by release ident, dumps json with tags and extra metadata
+
design:
- elasticschema schema
- i18n URL schema
@@ -33,7 +46,7 @@ papers:
- WHO reports and recommendations
- "hammer and the dance" blog-post
- korean, chinese, singaporean reports
-
+- http://subject.med.wanfangdata.com.cn/Channel/7?mark=34
tools?