aboutsummaryrefslogtreecommitdiffstats
path: root/guide/src/scope.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2018-09-20 12:53:23 -0700
committerBryan Newbold <bnewbold@robocracy.org>2018-09-20 12:53:23 -0700
commitda8911b029f06023d5d8f8aad3cc845583e6d708 (patch)
tree62c6c92fb8e40a1708e156b83fe309edb392bee5 /guide/src/scope.md
parentf10bcb49d17234dc52c8b67a7b7fd1796ab6f435 (diff)
downloadfatcat-da8911b029f06023d5d8f8aad3cc845583e6d708.tar.gz
fatcat-da8911b029f06023d5d8f8aad3cc845583e6d708.zip
copy some notes to guide
Diffstat (limited to 'guide/src/scope.md')
-rw-r--r--guide/src/scope.md64
1 files changed, 64 insertions, 0 deletions
diff --git a/guide/src/scope.md b/guide/src/scope.md
new file mode 100644
index 00000000..d5e74156
--- /dev/null
+++ b/guide/src/scope.md
@@ -0,0 +1,64 @@
+# Scope
+
+The goal is to capture the "scholarly web": the graph of written works that
+cite other works. Any work that is both cited more than once and cites more
+than one other work in the catalog is very likely to be in scope. "Leaf nodes"
+and small islands of intra-cited works may or may not be in scope.
+
+Overall focus is on written works, with some exceptions. The expected core
+focus (for which we would pursue "completeness") is:
+
+ journal articles
+ academic books
+ conference proceedings
+ technical memos
+ dissertations
+ monographs
+ well-researched blog posts
+ web pages (that have citations)
+ "white papers"
+
+Possibly in scope:
+
+ reports
+ magazine articles
+ essays
+ notable mailing list postings
+ government documents
+ presentations (slides, video)
+ datasets
+ well-researched wiki pages
+ patents
+
+Probably not:
+
+ court cases and legal documents
+ newspaper articles
+ social media
+ manuals
+ datasheets
+ courses
+ published poetry
+
+Definitely not:
+
+ audio recordings
+ tv show episodes
+ musical scores
+ advertisements
+
+Author, citation, and work disambiguation would be core tasks. Linking
+pre-prints to final publication is in scope.
+
+I'm much less interested in altmetrics, funding, and grant relationships than
+most existing databases in this space.
+
+fatcat would not include any fulltext content itself, even for cleanly licensed
+(open access) works, but would have "strong" (verified) links to fulltext
+content, and would include file-level metadata (like hashes and fingerprints)
+to help discovery and identify content from any source. File-level URLs with
+context ("repository", "author-homepage", "web-archive") should make fatcat
+more useful for both humans and machines to quickly access fulltext content of
+a given mimetype than existing redirect or landing page systems. So another
+factor in deciding scope is whether a work has "digital fixity" and can be
+contained in a single immutable file.