path: root/guide/src/contributing.md
diff options
authorBryan Newbold <bnewbold@robocracy.org>2020-10-02 00:12:57 -0700
committerBryan Newbold <bnewbold@robocracy.org>2020-10-02 00:12:57 -0700
commitc5fdad74622350e7445a962f96975348a964018d (patch)
tree596cffd9b2af9421e81da0d3348924188b76b5b2 /guide/src/contributing.md
parent95bcef522ba3cdb32fc60078caec38855506c814 (diff)
update 'contributing' page in guide
Diffstat (limited to 'guide/src/contributing.md')
1 files changed, 137 insertions, 0 deletions
diff --git a/guide/src/contributing.md b/guide/src/contributing.md
new file mode 100644
index 00000000..2b43f15a
--- /dev/null
+++ b/guide/src/contributing.md
@@ -0,0 +1,137 @@
+# Contributing
+Our aspiration is for this to be an open, collaborative project, with
+individuals and organization of all sizes able to participate. There is not
+much structure or documentation on how volunteers can get started or be most
+helpful, but perhaps we can work together on that as well!
+The best place to organize and coordinate right now is the
+[gitter chatroom](https://gitter.im/internetarchive/fatcat). Gitter is
+described as "for developers", but we use it for everybody, and you don't need
+an invitation.
+Want to help out? Below are a few example roles you could play.
+#### Anybody: Find Bugs, Suggest Improvements
+The user sign-up and editing workflow on fatcat.wiki is currently pretty poor.
+How could this experience be improved and better documented? Specific ideas,
+suggestions and diagrams would be very helpful. You don't need to know how to
+program or about web technologies to contribute; hand drawings and example text
+can be sufficient.
+#### Community Organizer: Partner and Volunteer Organizing
+Are you passionate about Open Access and want to help build a community around
+preservation and universal access to knowledge? We could use help structuring
+an editing community, and communicating with partner projects like Wikidata to
+ensure we are not duplicating efforts.
+A good example of a project to organize would be improving journal-level
+metadata in wikidata, including journal homepages, and linking to fatcat
+"container" entities.
+#### Research Librarian: Identify Missing Content
+If you have an interest in a specific scholarly field, you could give us
+feedback on how good of a job fatcat is doing preserving at-risk open access
+content. We know we have a lot of work to do, but both specific examples of
+missing publications, as well as broader patterns and missing holes are helpful
+to know about. Some missing content we know we don't have, but there are surely
+entire categories of in-scope content that we do not even know are missing!
+#### Metadata Librarian: Schema Improvements
+Are you an experienced wrangler of BibFrame, MARC, bibtext, RDF, OAI-PMH, and
+Citation Style Language? Our data model and entity schemas are bespoke (sorry!)
+and designed to evolved over time. There might be related efforts and new
+controlled vocabularies we could adopt or align with, or small changes to the
+schema might enable new use cases. It could be as simple as identifying and
+prioritizing new external identifiers (PIDs) to allow. Let us know what we got
+right and what needs improvement!
+#### Power Editor: Better Interfaces
+Are you super experienced with data entry, editing, and corrections? Do you
+have ideas on how our interface could be improved, or what kinds of new
+interfaces and tools could be build to support effective editing? Our open API
+allows third-party interfaces to make edits on individuals' behalf, meaning new
+tools can be build for specific patterns of editing or user contribution.
+#### Data Scientist: Wrangling and Visualization
+We have hundreds of gigabytes of metadata to transform and normalize before
+importing, and already have a rich open dataset with millions of linked
+entities. Our elasticsearch analytics database has an open read-only endpoint
+(<https://search.fatcat.wiki>), which are used to power our [coverage
+interface](https://fatcat.wiki/coverage/search). What other interactive
+visualizations could be built? What tools should we be using to wrangle
+bibliographic metadata better and faster?
+#### Author: Verify Metadata
+Do you publish research documents, and want to ensure it is accessible to the
+broadest audience today and in the future? Like many academic search engines,
+you can add papers and link an author profile to specific publications. Unlike
+others, you can also ensure uploaded pre-prints and other open versions of your
+research are found and linked using the "save paper now" feature, and you can
+any errors made by publishers and bots.
+#### Translation and Accessibility Advocate
+Some of our web interfaces have existing internationalization infrastructure,
+and translations can be
+[contributed directly](https://hosted.weblate.org/projects/internetarchive/).
+Other projects need help getting translation infrastructure in place, and all
+of our projects could use review and recommendations for improvement by experts
+in web accessibility. For example, if you use a screen reader, feedback on
+which parts of our services are most difficult to use are very helpful.
+#### Software Developer: Bot Wrangling
+Fatcat is structured such that all changes to the catalog go through an open
+API. This includes human edits through the web interface, but the large
+majority of edits are made by bots. You could write a new bot to help...
+- review human edits (from the "reviewable" queue) to "lint" for typos, missing
+ fields, or other problems, and then leave an annotation
+- harvest, transform, and import metadata from addition subject- and
+ region-specific sources
+- find and clean-up patterns of poor or incorrect metadata already in the
+ catalog
+#### SQL Expert: Database Scaling
+We have a large (500+ GByte) PostgreSQL database backing the catalog. This is
+working great so far, but we have concerns about how the catalog will scale
+further, especially if bots start making multiple updates per entity. You could
+review our SQL schema and recommend improvements, or give feedback and advice
+on how to switch to a distributed primary datastore.
+#### Financial Supporter
+Short on time? As a US 501(c)(3) non-profit, the Internet Archive always
+appreciates and makes good use of [donations](https://archive.org/donate/).
+## Software Contributions
+Bugs and patches can be filed on Github at: <https://github.com/internetarchive/fatcat>
+When considering making a non-trivial contribution, it can save review time and
+duplicated work to post an issue with your intentions and plan. New code and
+features must include unit tests before being merged, though we can help with
+writing them.