aboutsummaryrefslogtreecommitdiffstats
path: root/notes/plan.mv
blob: 0f973053734611a5718220ada8d5e48262fa4a26 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

layout:
- pipenv, python3.7, flask, elasticsearch-dsl, semantic-ui
- python code/libs in sub-directory
- single-file flask with all routes, call helper routines

prototype pipeline:
- CORD-19 dataset
- enrich script fetches fatcat metadata, outputs combined .json
- download + derive manually
- transform script (based on download) creates ES documents as JSON

pipeline:
- .json files with basic metadata from each source
    => CORD-19
    => fatcat ES queries
    => manual addition
- enrich script takes all the above, does fatcat lookups, de-dupes by release ident, dumps json with tags and extra metadata

design:
- elasticschema schema
- i18n URL schema
- single-page? multi-page?
- tags/indicators for quality

infra:
- register dns: covid19.qa.fatcat.wiki, covid19.fatcat.wiki

examples:
- jupyter notebook
- observable hq

implement:
- download GROBID as well as PDFs

topics:
- Favipiravir
- Chloroquine

tasks/research:
- tracking down every single paper from WHO etc
- finding interesting older papers

papers:
- imperial college paper
- WHO reports and recommendations
- "hammer and the dance" blog-post
- korean, chinese, singaporean reports
- http://subject.med.wanfangdata.com.cn/Channel/7?mark=34


tools?
- vega-lite