blob: 0f973053734611a5718220ada8d5e48262fa4a26 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
layout:
- pipenv, python3.7, flask, elasticsearch-dsl, semantic-ui
- python code/libs in sub-directory
- single-file flask with all routes, call helper routines
prototype pipeline:
- CORD-19 dataset
- enrich script fetches fatcat metadata, outputs combined .json
- download + derive manually
- transform script (based on download) creates ES documents as JSON
pipeline:
- .json files with basic metadata from each source
=> CORD-19
=> fatcat ES queries
=> manual addition
- enrich script takes all the above, does fatcat lookups, de-dupes by release ident, dumps json with tags and extra metadata
design:
- elasticschema schema
- i18n URL schema
- single-page? multi-page?
- tags/indicators for quality
infra:
- register dns: covid19.qa.fatcat.wiki, covid19.fatcat.wiki
examples:
- jupyter notebook
- observable hq
implement:
- download GROBID as well as PDFs
topics:
- Favipiravir
- Chloroquine
tasks/research:
- tracking down every single paper from WHO etc
- finding interesting older papers
papers:
- imperial college paper
- WHO reports and recommendations
- "hammer and the dance" blog-post
- korean, chinese, singaporean reports
- http://subject.med.wanfangdata.com.cn/Channel/7?mark=34
tools?
- vega-lite
|