diff options
-rw-r--r-- | .gitignore | 21 | ||||
-rw-r--r-- | plan.txt | 40 | ||||
-rw-r--r-- | rfc.md | 38 |
3 files changed, 99 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..81a4762 --- /dev/null +++ b/.gitignore @@ -0,0 +1,21 @@ +*.o +*.a +*.pyc +#*# +*~ +*.swp +.* +*.tmp +*.old +*.profile +*.bkp +*.bak +[Tt]humbs.db +*.DS_Store +build/ +_build/ +src/build/ +*.log + +# Don't ignore this file itself +!.gitignore diff --git a/plan.txt b/plan.txt new file mode 100644 index 0000000..80c934d --- /dev/null +++ b/plan.txt @@ -0,0 +1,40 @@ + +layout: +- pipenv, python3.7, flask, elasticsearch-dsl, semantic-ui +- python code/libs in sub-directory +- single-file flask with all routes, call helper routines + +design: +- elasticschema schema +- i18n URL schema +- single-page? multi-page? +- tags/indicators for quality + +infra: +- register dns: covid19.qa.fatcat.wiki, covid19.fatcat.wiki + +examples: +- jupyter notebook +- observable hq + +implement: +- download GROBID as well as PDFs + +topics: +- Favipiravir +- Chloroquine + +tasks/research: +- tracking down every single paper from WHO etc +- finding interesting older papers + +papers: +- imperial college paper +- WHO reports and recommendations +- "hammer and the dance" blog-post +- korean, chinese, singaporean reports + + + +tools? +- vega-lite @@ -0,0 +1,38 @@ + +Research index and searchable discovery tool of papers and datasets related to +COVID-19. + +Features: +- fulltext search over papers +- direct download PDFs +- find content by search queries + lists of identifiers + +## Design + +Web interface build on elasticsearch. Guessing on the order of 100k entities. + +Batch back-end system aggregates papers of interest, fetches metadata from +fatcat, fetches fulltext+GROBID, indexes into elasticsearch. Run periodically +(eg, daily, hourly) + +Some light quality tooling to find bad metadata; do cleanups in fatcat itself. + + +## Thoughts / Brainstorm + +Tagging? Eg, by type of flu, why paper included + +Clearly indicate publication status (pre-prints). + +Auto-translation to multiple languages. Translation/i18n of user interface. + +Dashboards/graphs of stats? + +Faceted search. + + +## Also + +Find historical papers of interest, eg the Spanish Flu, feature in blog posts. + +Manually add interesting/valuable greylit like notable blog posts, WHO reports. |