From f5a883642dd114ac2c29c72348bed05616189aa2 Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@archive.org>
Date: Mon, 11 May 2020 19:12:13 -0700
Subject: start sketching proposals

---
 proposals/overview.md | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 proposals/overview.md

(limited to 'proposals/overview.md')

diff --git a/proposals/overview.md b/proposals/overview.md
new file mode 100644
index 0000000..fa8148c
--- /dev/null
+++ b/proposals/overview.md
@@ -0,0 +1,38 @@
+
+
+Can be multiple releases for each work:
+
+- required: most canonical published version ("version of record", what would be cited)
+    => or, most updated?
+- optional: mostly openly accessible version
+- optional: updated version
+    => errata, corrected version, or retraction
+- optional: fulltext indexed version
+    => might be not the most updated, or no accessible
+
+
+## Initial Plan
+
+Index all fatcat works in catalog.
+
+Always link to a born-digital copy if one is accessible.
+
+Always link to a SIM microfilm copy if one is available.
+
+Use best available fulltext for search. If structured, like TEI-XML, index the
+body text separate from abstracts and references.
+
+
+## Other Ideas
+
+Do fulltext indexing at the granularity of pages, or some other segments of
+text within articles (paragraphs, chapters, sections).
+
+Fatcat already has all of Crossref, Pubmed, Arxiv, and several other
+authoritative metadata sources. But today we are missing a good chunk of
+content, particularly from institutional repositories and CS conferences (which
+don't use identifiers). Also don't have good affiliation or citation count
+coverage, and mixed/poor abstract coverage.
+
+Could use Microsoft Academic Graph (MAG) metadata corpus (or similar) to
+bootstrap with better metadata coverage.
-- 
cgit v1.2.3