aboutsummaryrefslogtreecommitdiffstats
path: root/notes/rfc.md
diff options
context:
space:
mode:
Diffstat (limited to 'notes/rfc.md')
-rw-r--r--notes/rfc.md38
1 files changed, 38 insertions, 0 deletions
diff --git a/notes/rfc.md b/notes/rfc.md
new file mode 100644
index 0000000..6a5c516
--- /dev/null
+++ b/notes/rfc.md
@@ -0,0 +1,38 @@
+
+Research index and searchable discovery tool of papers and datasets related to
+COVID-19.
+
+Features:
+- fulltext search over papers
+- direct download PDFs
+- find content by search queries + lists of identifiers
+
+## Design
+
+Web interface build on elasticsearch. Guessing on the order of 100k entities.
+
+Batch back-end system aggregates papers of interest, fetches metadata from
+fatcat, fetches fulltext+GROBID, indexes into elasticsearch. Run periodically
+(eg, daily, hourly)
+
+Some light quality tooling to find bad metadata; do cleanups in fatcat itself.
+
+
+## Thoughts / Brainstorm
+
+Tagging? Eg, by type of flu, why paper included
+
+Clearly indicate publication status (pre-prints).
+
+Auto-translation to multiple languages. Translation/i18n of user interface.
+
+Dashboards/graphs of stats?
+
+Faceted search.
+
+
+## Also
+
+Find historical papers of interest, eg the Spanish Flu, feature in blog posts.
+
+Manually add interesting/valuable greylit like notable blog posts, WHO reports.