aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-03-23 20:43:37 -0700
committerBryan Newbold <bnewbold@archive.org>2020-03-23 20:43:37 -0700
commit7663e25d5b9cf740fa1ecca3b716a1efc1329f37 (patch)
treec4674d2e79f14b3781f55ebdf691096e29f9b544
downloadfatcat-covid19-7663e25d5b9cf740fa1ecca3b716a1efc1329f37.tar.gz
fatcat-covid19-7663e25d5b9cf740fa1ecca3b716a1efc1329f37.zip
init repo with early notes
-rw-r--r--.gitignore21
-rw-r--r--plan.txt40
-rw-r--r--rfc.md38
3 files changed, 99 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..81a4762
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,21 @@
+*.o
+*.a
+*.pyc
+#*#
+*~
+*.swp
+.*
+*.tmp
+*.old
+*.profile
+*.bkp
+*.bak
+[Tt]humbs.db
+*.DS_Store
+build/
+_build/
+src/build/
+*.log
+
+# Don't ignore this file itself
+!.gitignore
diff --git a/plan.txt b/plan.txt
new file mode 100644
index 0000000..80c934d
--- /dev/null
+++ b/plan.txt
@@ -0,0 +1,40 @@
+
+layout:
+- pipenv, python3.7, flask, elasticsearch-dsl, semantic-ui
+- python code/libs in sub-directory
+- single-file flask with all routes, call helper routines
+
+design:
+- elasticschema schema
+- i18n URL schema
+- single-page? multi-page?
+- tags/indicators for quality
+
+infra:
+- register dns: covid19.qa.fatcat.wiki, covid19.fatcat.wiki
+
+examples:
+- jupyter notebook
+- observable hq
+
+implement:
+- download GROBID as well as PDFs
+
+topics:
+- Favipiravir
+- Chloroquine
+
+tasks/research:
+- tracking down every single paper from WHO etc
+- finding interesting older papers
+
+papers:
+- imperial college paper
+- WHO reports and recommendations
+- "hammer and the dance" blog-post
+- korean, chinese, singaporean reports
+
+
+
+tools?
+- vega-lite
diff --git a/rfc.md b/rfc.md
new file mode 100644
index 0000000..6a5c516
--- /dev/null
+++ b/rfc.md
@@ -0,0 +1,38 @@
+
+Research index and searchable discovery tool of papers and datasets related to
+COVID-19.
+
+Features:
+- fulltext search over papers
+- direct download PDFs
+- find content by search queries + lists of identifiers
+
+## Design
+
+Web interface build on elasticsearch. Guessing on the order of 100k entities.
+
+Batch back-end system aggregates papers of interest, fetches metadata from
+fatcat, fetches fulltext+GROBID, indexes into elasticsearch. Run periodically
+(eg, daily, hourly)
+
+Some light quality tooling to find bad metadata; do cleanups in fatcat itself.
+
+
+## Thoughts / Brainstorm
+
+Tagging? Eg, by type of flu, why paper included
+
+Clearly indicate publication status (pre-prints).
+
+Auto-translation to multiple languages. Translation/i18n of user interface.
+
+Dashboards/graphs of stats?
+
+Faceted search.
+
+
+## Also
+
+Find historical papers of interest, eg the Spanish Flu, feature in blog posts.
+
+Manually add interesting/valuable greylit like notable blog posts, WHO reports.