diff options
Diffstat (limited to 'notes/rfc.md')
-rw-r--r-- | notes/rfc.md | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/notes/rfc.md b/notes/rfc.md new file mode 100644 index 0000000..6a5c516 --- /dev/null +++ b/notes/rfc.md @@ -0,0 +1,38 @@ + +Research index and searchable discovery tool of papers and datasets related to +COVID-19. + +Features: +- fulltext search over papers +- direct download PDFs +- find content by search queries + lists of identifiers + +## Design + +Web interface build on elasticsearch. Guessing on the order of 100k entities. + +Batch back-end system aggregates papers of interest, fetches metadata from +fatcat, fetches fulltext+GROBID, indexes into elasticsearch. Run periodically +(eg, daily, hourly) + +Some light quality tooling to find bad metadata; do cleanups in fatcat itself. + + +## Thoughts / Brainstorm + +Tagging? Eg, by type of flu, why paper included + +Clearly indicate publication status (pre-prints). + +Auto-translation to multiple languages. Translation/i18n of user interface. + +Dashboards/graphs of stats? + +Faceted search. + + +## Also + +Find historical papers of interest, eg the Spanish Flu, feature in blog posts. + +Manually add interesting/valuable greylit like notable blog posts, WHO reports. |