Want the "scholarly web": the graph of works that cite other works. Certainly
    every work that is cited more than once and every work that both cites and
    is cited; "leaf nodes" and small islands might not be in scope.

Focusing on written works, with some exceptions. Expect core media (going for
completeness) to be:

    journal articles
    books
    proceedings
    technical memos
    reports
    dissertations

Probably in scope:

    magazine articles
    published poetry
    essays
    government documents
    conference
    presentations (slides, video)

Probably not:

    patents
    court cases and legal documents
    manuals
    datasheets
    courses

Definitely not:

    audio recordings
    tv show episodes
    musical scores
    advertisements

Potential add-on services:

    course syllabi