aboutsummaryrefslogtreecommitdiffstats
path: root/guide/src/goals.md
diff options
context:
space:
mode:
Diffstat (limited to 'guide/src/goals.md')
-rw-r--r--guide/src/goals.md48
1 files changed, 24 insertions, 24 deletions
diff --git a/guide/src/goals.md b/guide/src/goals.md
index e7ef1512..9bb64b62 100644
--- a/guide/src/goals.md
+++ b/guide/src/goals.md
@@ -1,14 +1,14 @@
## Project Goals and Ecosystem Niche
-The Internet Archive has two primary use cases for fatcat:
+The Internet Archive has two primary use cases for Fatcat:
- Tracking the "completeness" of our holdings against all known published
works. In particular, allow us to monitor progress, identify gaps, and
prioritize further collection work.
- Be a public-facing catalog and access mechanism for our open access holdings.
-In the larger ecosystem, fatcat could also provide:
+In the larger ecosystem, Fatcat could also provide:
- A work-level (as opposed to title-level) archival dashboard: what fraction of
all published works are preserved in archives? [KBART](), [CLOCKSS](),
@@ -22,8 +22,8 @@ In the larger ecosystem, fatcat could also provide:
reproducibility (metadata corpus itself is open access, and file-level hashes
control for content drift)
- Foundational infrastructure for distributed digital preservation
-- On-ramp for non-traditional digital works ("grey literature") into the
- scholarly web
+- On-ramp for non-traditional digital works (web-native and "grey literature")
+ into the scholarly web
[KBART]: https://thekeepers.org/
[CLOCKSS]: https://clockss.org
@@ -35,22 +35,22 @@ What types of works should be included in the catalog?
The goal is to capture the "scholarly web": the graph of written works that
cite other works. Any work that is both cited more than once and cites more
-than one other work in the catalog is very likely to be in scope. "Leaf nodes"
-and small islands of intra-cited works may or may not be in scope.
-
-Fatcat does not include any fulltext content itself, even for cleanly licensed
-(open access) works, but does have "strong" (verified) links to fulltext
-content, and includes file-level metadata (like hashes and fingerprints)
-to help discovery and identify content from any source. File-level URLs with
-context ("repository", "author-homepage", "web-archive") should make fatcat
-more useful for both humans and machines to quickly access fulltext content of
-a given mimetype than existing redirect or landing page systems. So another
-factor in deciding scope is whether a work has "digital fixity" and can be
-contained in a single immutable file.
+than one other work in the catalog is likely to be in scope. "Leaf nodes" and
+small islands of intra-cited works may or may not be in scope.
+
+Fatcat does not include any fulltext content itself, even for clearly licensed
+open access works, but does have verified hyperlinks to fulltext content, and
+includes file-level metadata (hashes and fingerprints) to help identify content
+from any source. File-level URLs with context ("repository", "publisher",
+"webarchive") should make Fatcat more useful for both humans and machines to
+quickly access fulltext content of a given mimetype than existing redirect or
+landing page systems. So another factor in deciding scope is whether a work has
+"digital fixity" and can be contained in immutable files or can be captured by
+web archives.
## References and Previous Work
-The closest overall analog of fatcat is [MusicBrainz][mb], a collaboratively
+The closest overall analog of Fatcat is [MusicBrainz][mb], a collaboratively
edited music database. [Open Library][ol] is a very similar existing service,
which exclusively contains book metadata.
@@ -60,23 +60,23 @@ open bibliographic database at this time (early 2018), including the
Wikidata is a general purpose semantic database of entities, facts, and
relationships; bibliographic metadata has become a large fraction of all
content in recent years. The focus there seems to be linking knowledge
-(statements) to specific sources unambiguously. Potential advantages fatcat has
+(statements) to specific sources unambiguously. Potential advantages Fatcat has
are a focus on a specific scope (not a general-purpose database of entities)
and a goal of completeness (capturing as many works and relationships as
rapidly as possible). With so much overlap, the two efforts might merge in the
future.
-The technical design of fatcat is loosely inspired by the git
+The technical design of Fatcat is loosely inspired by the git
branch/tag/commit/tree architecture, and specifically inspired by Oliver
Charles' "New Edit System" [blog posts][nes-blog] from 2012.
-There are a whole bunch of proprietary, for-profit bibliographic databases,
+There are a number of proprietary, for-profit bibliographic databases,
including Web of Science, Google Scholar, Microsoft Academic Graph, aminer,
Scopus, and Dimensions. There are excellent field-limited databases like dblp,
-MEDLINE, and Semantic Scholar. There are some large general-purpose databases
-that are not directly user-editable, including the OpenCitation corpus, CORE,
-BASE, and CrossRef. We do not know of any large (more than 60 million works),
-open (bulk-downloadable with permissive or no license), field agnostic,
+MEDLINE, and Semantic Scholar. Large, general-purpose databases also exist that
+are not directly user-editable, including the OpenCitation corpus, CORE, BASE,
+and CrossRef. We do not know of any large (more than 60 million works), open
+(bulk-downloadable with permissive or no license), field agnostic,
user-editable corpus of scholarly publication bibliographic metadata.
[nes-blog]: https://ocharles.org.uk/blog/posts/2012-07-10-nes-does-it-better-1.html