diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-04-03 15:14:46 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-04-03 15:14:46 -0700 |
commit | 2bdda2dbf8204d0dd36a4b5b7460ff89bfcc3b5c (patch) | |
tree | 546ebabc8865b1b4c6ab36aa1788a23943f66079 | |
parent | 4ff398fbd04c57444680904d1059d916bd2d2fcc (diff) | |
download | fatcat-covid19-2bdda2dbf8204d0dd36a4b5b7460ff89bfcc3b5c.tar.gz fatcat-covid19-2bdda2dbf8204d0dd36a4b5b7460ff89bfcc3b5c.zip |
update plan file
-rw-r--r-- | plan.txt | 15 |
1 files changed, 14 insertions, 1 deletions
@@ -4,6 +4,19 @@ layout: - python code/libs in sub-directory - single-file flask with all routes, call helper routines +prototype pipeline: +- CORD-19 dataset +- enrich script fetches fatcat metadata, outputs combined .json +- download + derive manually +- transform script (based on download) creates ES documents as JSON + +pipeline: +- .json files with basic metadata from each source + => CORD-19 + => fatcat ES queries + => manual addition +- enrich script takes all the above, does fatcat lookups, de-dupes by release ident, dumps json with tags and extra metadata + design: - elasticschema schema - i18n URL schema @@ -33,7 +46,7 @@ papers: - WHO reports and recommendations - "hammer and the dance" blog-post - korean, chinese, singaporean reports - +- http://subject.med.wanfangdata.com.cn/Channel/7?mark=34 tools? |