From da8911b029f06023d5d8f8aad3cc845583e6d708 Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@robocracy.org>
Date: Thu, 20 Sep 2018 12:53:23 -0700
Subject: copy some notes to guide

---
 guide/src/overview.md | 101 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 100 insertions(+), 1 deletion(-)

(limited to 'guide/src/overview.md')

diff --git a/guide/src/overview.md b/guide/src/overview.md
index bc08ce1e..8e6279ed 100644
--- a/guide/src/overview.md
+++ b/guide/src/overview.md
@@ -1,3 +1,102 @@
 # Fatcat Overview
 
-For now, see the [RFC](https://fatcat.wiki).
+fatcat is an open bibliographic catalog of written works.  The scope of works
+is somewhat flexible, with a focus on published research outputs like journal
+articles, pre-prints, and conference proceedings. Records are collaboratively
+editable, versioned, available in bulk form, and include URL-agnostic
+file-level metadata.
+
+fatcat is currently used internally at the Internet Archive, but interested
+folks are welcome to contribute to design and development.
+
+## Goals and Ecosystem Niche
+
+For the Internet Archive use case, fatcat has two primary use cases:
+
+- Track the "completeness" of our holdings against all known published works.
+  In particular, allow us to monitor and prioritize further collection work.
+- Be a public-facing catalog and access mechanism for our open access holdings.
+
+In the larger ecosystem, fatcat could also provide:
+
+- A work-level (as opposed to title-level) archival dashboard: what fraction of
+  all published works are preserved in archives? KBART, CLOCKSS, Portico, and
+  other preservations don't provide granular metadata
+- A collaborative, independent, non-commercial, fully-open, field-agnostic,
+  "completeness"-oriented catalog of scholarly metadata
+- Unified (centralized) foundation for discovery and access across repositories
+  and archives: discovery projects can focus on user experience instead of
+  building their own catalog from scratch
+- Research corpus for meta-science, with an emphasis on availability and
+  reproducibility (metadata corpus itself is open access, and file-level hashes
+  control for content drift)
+- Foundational infrastructure for distributed digital preservation
+- On-ramp for non-traditional digital works ("grey literature") into the
+  scholarly web
+
+## Scope
+
+The goal is to capture the "scholarly web": the graph of written works that
+cite other works. Any work that is both cited more than once and cites more
+than one other work in the catalog is very likely to be in scope. "Leaf nodes"
+and small islands of intra-cited works may or may not be in scope.
+
+fatcat would not include any fulltext content itself, even for cleanly licensed
+(open access) works, but would have "strong" (verified) links to fulltext
+content, and would include file-level metadata (like hashes and fingerprints)
+to help discovery and identify content from any source. File-level URLs with
+context ("repository", "author-homepage", "web-archive") should make fatcat
+more useful for both humans and machines to quickly access fulltext content of
+a given mimetype than existing redirect or landing page systems. So another
+factor in deciding scope is whether a work has "digital fixity" and can be
+contained in a single immutable file.
+
+## References and Previous Work
+
+The closest overall analog of fatcat is [MusicBrainz][mb], a collaboratively
+edited music database. [Open Library][ol] is a very similar existing service,
+which exclusively contains book metadata.
+
+[Wikidata][wd] seems to be the most successful and actively edited/developed
+open bibliographic database at this time (early 2018), including the
+[wikicite][wikicite] conference and related Wikimedia/Wikipedia projects.
+Wikidata is a general purpose semantic database of entities, facts, and
+relationships; bibliographic metadata has become a large fraction of all
+content in recent years. The focus there seems to be linking knowledge
+(statements) to specific sources unambiguously. Potential advantages fatcat
+would have would be a focus on a specific scope (not a general-purpose database
+of entities) and a goal of completeness (capturing as many works and
+relationships as rapidly as possible). However, it might be better to just
+pitch in to the wikidata efforts.
+
+The technical design of fatcat is loosely inspired by the git
+branch/tag/commit/tree architecture, and specifically inspired by Oliver
+Charles' "New Edit System" [blog posts][nes-blog] from 2012.
+
+There are a whole bunch of proprietary, for-profit bibliographic databases,
+including Web of Science, Google Scholar, Microsoft Academic Graph, aminer,
+Scopus, and Dimensions. There are excellent field-limited databases like dblp,
+MEDLINE, and Semantic Scholar. There are some large general-purpose databases
+that are not directly user-editable, including the OpenCitation corpus, CORE,
+BASE, and CrossRef. I don't know of any large (more than 60 million works),
+open (bulk-downloadable with permissive or no license), field agnostic,
+user-editable corpus of scholarly publication bibliographic metadata.
+
+[nes-blog]: https://ocharles.org.uk/blog/posts/2012-07-10-nes-does-it-better-1.html
+[mb]: https://musicbrainz.org
+[ol]: https://openlibrary.org
+[wd]: https://wikidata.org
+[wikicite]: https://meta.wikimedia.org/wiki/WikiCite_2017
+
+## Further Reading
+
+"From ISIS to CouchDB: Databases and Data Models for Bibliographic Records" by Luciano G. Ramalho. code4lib, 2013. <https://journal.code4lib.org/articles/4893>
+
+"Representing bibliographic data in JSON". github README file, 2017. <https://github.com/rdmpage/bibliographic-metadata-json>
+
+"Citation Style Language", <https://citationstyles.org/>
+
+"Functional Requirements for Bibliographic Records", Wikipedia article, <https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records>
+
+OpenCitations and I40C <http://opencitations.net/>, <https://i4oc.org/>
+
-- 
cgit v1.2.3