README, about page, sources page

author: Bryan Newbold <bnewbold@archive.org> 2020-04-03 16:38:59 -0700
committer: Bryan Newbold <bnewbold@archive.org> 2020-04-03 16:38:59 -0700
commit: cf2bfc9382fe1c934f2e11562c5c95b86fac5114 (patch)
tree: 50b1b3150696ed08a3af3b80ece9fbd81718e24e /fatcat_covid19
parent: fb767adb9472ff85b46b5a383f3986950b12dd27 (diff)
download: fatcat-covid19-cf2bfc9382fe1c934f2e11562c5c95b86fac5114.tar.gz
fatcat-covid19-cf2bfc9382fe1c934f2e11562c5c95b86fac5114.zip
2 files changed, 116 insertions, 3 deletions
diff --git a/fatcat_covid19/templates/about_en.html b/fatcat_covid19/templates/about_en.html
index 8db4a6f..95a272d 100644
--- a/fatcat_covid19/templates/about_en.html
+++ b/fatcat_covid19/templates/about_en.html
@@ -6,6 +6,63 @@
 
 <h1>About Fatcat COVID-19 Paper Search</h1>
 
-TODO
+<p>
+This is a prototype full text search index of papers, reports, datasets, and
+other research resources related to the COVID-19 crisis, including public
+health responses to influenza pandemics more generally. The curation of content
+to be included is based on efforts like the "CORD-19" dataset and efforts by
+authorities such as the WHO and NIH Pubmed. Metadata and content comes from the
+existing open <a href="https://fatcat.wiki">fatcat</a> catalog of research
+outputs.
+See <a href="{{ url_for("search.page_sources") }}">"Sources"</a> for details.
+
+<p>
+It is hoped that with additional care and development this resource may be
+useful to anybody keeping up with research in this area, and particularly folks
+working on systemic reviews, bibliometrics, or metaresearch. However, at time
+of writing, this is at best a technology demonstration, not a robust piece of
+knowledge infrastructure.
+
+<p>
+We encourage folks to consider the following more authoriative and
+well-supported tools for research discovery:
+
+<ul>
+  <li><a href="https://pubmed.gov">Pubmed</a> for biomedical research in
+  general, and the subject-specific <a href="https://www.ncbi.nlm.nih.gov/research/coronavirus/">LitCovid</a>
+  index for COVID-19.
+  <li><a href="https://www.semanticscholar.org/">Semantic Scholar</a>
+  <li><a href="https://scholar.google.com">Google Scholar</a>
+</ul>
+
+<p>
+Feedback and queries can be directed to <b><a href="mailto:webservices@archive.org">webservices@archive.org</a></b>.
+
+<h2>Service Disclaimers</h2>
+
+<p>
+This is not a production-supported service of the Internet Archive. The website
+and search API may become unavailable due to resource load, operator
+availability, etc. If you would like to depend on this service, please contact
+us.
+
+<p>
+Some content available in this index may not be "perpetually accessible" after
+the COVID-19 crises ends, due to temporary content licenses. The service itself
+(covid19.fatcat.wiki) may also not be operated after the crisis, though all of
+the source code and upstream metadata should be "perpetually accessible".
+
+<h2>Additional Resources</h2>
+
+<p>
+Source code is available on Github, and bugs can be reported there as issues:
+<a href="https://github.com/bnewbold/covid19-fatcat-wiki">https://github.com/bnewbold/covid19-fatcat-wiki</a>
+
+<p>An elasticsearch API is available; see the above repo README for details.
+
+<p>
+Bulk exports of metadata and derived content are available on the Internet
+Archive at:
+<a href="https://archive.org/details/fatcat_covid19">https://archive.org/details/fatcat_covid19</a>
 
 {% endblock %}
diff --git a/fatcat_covid19/templates/sources_en.html b/fatcat_covid19/templates/sources_en.html
index d46ac77..bca32a7 100644
--- a/fatcat_covid19/templates/sources_en.html
+++ b/fatcat_covid19/templates/sources_en.html
@@ -4,8 +4,64 @@
 
 {% block body %}
 
-<h1>{{ _("Sources of Content and Metadata") }}</h1>
+<h2>Curated COVID-19 Sources</h2>
 
-TODO
+Works are tagged with the source of their inclusion in this COVID-19 corpus:
+
+<ul>
+  <li><a href="https://pages.semanticscholar.org/coronavirus-research">Allen Institute for AI CORD-19 corpus</a>
+  <li><a href="https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov">WHO Database of publications on coronavirus disease (COVID-19)</a>
+  <li><a href="http://subject.med.wanfangdata.com.cn/Channel/7">Wanfang corpus of Chinese COVID-19 papers</a>
+  <li><a href="http://en.gzbd.cnki.net/GZBT/brief/Default.aspx">CNKI corpus of Chinese COVID-19 papers</a>
+  <li><a href="https://fatcat.wiki">Fatcat</a> (based on keyword queries against the full catalog)
+</ul>
+
+To clarify use of the CORD-19 corpus in particular, the corpus is used only to
+identify papers for inclusion in this index (eg, by DOI or PMCID).
+Bibliographic metadata and content is then fetched from the exiting Fatcat
+catalog of open metadata, and full-text content is indexed from copies found on
+the public web, repositories, and publisher websites.
+
+<h2>Disclaimers</h2>
+
+<p>
+The fatcat catalog is intended to be a "universal" preservation and access
+archive, not a narrow currated collection of only the highest quality research
+content. This means that not all content has undergone peer-review, and some
+may have been uploaded to services like academic social networks (eg,
+researchgate) or institutional repositories with absolutely no human editorial
+review or filtering.
+
+<p>
+The catalog intends to capture metadata such as publication stage (draft,
+published, retracted), venue, and medium (journal article, web post,
+encyclopedia entry, frontmatter) to help filter through this content. But in
+some cases this metadata is incomplete or may be inaccurate. For example,
+pre-print PDF files may be incorrectly associated with the final published
+version of a work, or vica versa.
+
+
+<h2>Sources of Metadata</h2>
+
+The source of all bibliographic information is recorded in edit history
+metadata, which allows the provenance of all records to be reconstructed. A few
+major sources are worth highlighting here:
+
+<ul>
+ <li>Release metadata from <b>Crossref</b>, via their public
+ <a href="https://github.com/CrossRef/rest-api-doc">REST API</a>
+ <li>Release metadata and linked full-text content from NIH <b>Pubmed</b> and <b><a href="https://arxiv.org">arXiv.org</a></b>
+ <li>Release metadata and linked public domain full-text content the <b>JSTOR</b> Early Journal Content collection
+ <li>Creator names and de-duplication from <b>ORCID</b>, via their annual public data releases
+ <li>Journal title metadata from <b>DOAJ</b>, <b>ISSN ROAD</b>, and <b>SHERPA/RoMEO</b>
+ <li>Full-text URL lists from <b><a href="https://core.ac.uk">CORE</a></b>,
+ <b><a href="http://unpaywall.org">Unpaywall</a></b>,
+ <b><a href="https://www.semanticscholar.org">Semantic Scholar</a></b>,
+ <b><a href="https://citeseerx.ist.psu.edu">CiteseerX</a></b>,
+ and <b><a href="https://www.microsoft.com/en-us/research/project/academic">Microsoft Academic Graph</a></b>.
+ <li><a href="https://guide.fatcat.wiki/sources.html">The Fatcat Guide</a> lists more major sources
+</ul>
+
+Many thanks for the hard work of all these projects, institutions, and individuals!
 
 {% endblock %}
author	Bryan Newbold <bnewbold@archive.org>	2020-04-03 16:38:59 -0700
committer	Bryan Newbold <bnewbold@archive.org>	2020-04-03 16:38:59 -0700
commit	cf2bfc9382fe1c934f2e11562c5c95b86fac5114 (patch)
tree	50b1b3150696ed08a3af3b80ece9fbd81718e24e /fatcat_covid19
parent	fb767adb9472ff85b46b5a383f3986950b12dd27 (diff)
download	fatcat-covid19-cf2bfc9382fe1c934f2e11562c5c95b86fac5114.tar.gz fatcat-covid19-cf2bfc9382fe1c934f2e11562c5c95b86fac5114.zip