From 79db40766a9ef467f38a1ac7cc344a938f25a599 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 16 Nov 2022 20:26:52 -0800 Subject: guide: add 'for publishers' and 'for authors' sections --- guide/src/SUMMARY.md | 4 ++ guide/src/authors.md | 86 +++++++++++++++++++++++++++++++++++++++++ guide/src/publishers.md | 100 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 190 insertions(+) create mode 100644 guide/src/authors.md create mode 100644 guide/src/publishers.md diff --git a/guide/src/SUMMARY.md b/guide/src/SUMMARY.md index 7f41d25b..37c20b1f 100644 --- a/guide/src/SUMMARY.md +++ b/guide/src/SUMMARY.md @@ -29,6 +29,10 @@ - [Code of Conduct](./code_of_conduct.md) - [Privacy](./privacy_policy.md) +[For Publishers](./publishers.md) + +[For Authors](./authors.md) + [Further Reading](./bibliography.md) [About This Guide](./guide.md) diff --git a/guide/src/authors.md b/guide/src/authors.md new file mode 100644 index 00000000..3072db9a --- /dev/null +++ b/guide/src/authors.md @@ -0,0 +1,86 @@ +For Authors +=============== + +This page addresses common questions and concerns from individual authors of +works indexed in Fatcat, as well as the Internet Archive Scholar service built +on top of it. + +For help in exceptional cases, contact Internet Archive through our usual +support channels. + + +## Updating Works + +A frequent request from authors is to remove outdated versions of works. + +The philosophy of the catalog is to go beyond "the version of record" and +instead collect "the record of versions". This means that drafts, manuscripts, +working papers, and other alternative versions of works can be fully included +and differentiated using metadata in the catalog. Even in the case of +retractions, expressions of concern, or other serious issues with earlier +versions, it is valuable to keep out-of-date versions in the catalog. Corrected +or updated versions will generally be preferred and linked to publicly, for +example on scholar.archive.org. Outright removing content reduces context and +can result in additional confusion for readers and librarians. + +Because of this, it is strongly preferred to add new updated content instead of +requesting the removal of old out-of-date content. Depending on the situation, +this could involve creating a new post-publication `release` entity with the +date of update, with status `updated` or `retracted`; or a new pre-publication +`release`; or crawling an updated PDF and adding to an existing `release` +entity. + + +## Correcting Metadata + +Sometimes the bibliographic metadata in fatcat is incorrect, incomplete, or out +of date. This is a particularly sensitive subject when it comes to representing +information about individuals. While we aspire to automating metadata updates +and improvements as much as possible, often a human touch is best. + +Any person can contribute to the catalog directly by creating an account and +submitting changes for review. This includes, but is not limited to, authors or +a person acting on their behalf submitting corrections. The [editing +quickstart](./quickstart.md) is a good place to start. Please remember that +corrections are considered part of the public record of the catalog and will be +preserved even if a contributor later deletes their account. Editor *usernames* +can be changed at any time. + +Fatcat is in some sense a non-authoritive catalog, which means that it is +usually best if corrections are made in "upstream" sources first (or at the +same time) as being corrected in fatcat. For example, updating metadata in +publisher databases, repositories, or ORCiD in addition to in fatcat. + + +### Name Changes + +The preferred workflow for author name changes depends on the author's +sensitivity to having prior names accessible and searchable. + +If "also known as" behvior is desirable, contributor names on the release +record should remain unchanged (matching what the publication at the time +indicated), and a linked `creator` entity should include the +currently-preferred name for display. + +If "also known as" is not acceptable, and the work has already been updated in +authoritative publication catalogs, then the contributor name can be updated on +`release` records as well. + +See also the [`creator` style guide](./entity_creator.md). + + +### Author Relation Completeness + +`creator` records are not always generated when importing `release` records; +the current practice is to create and/or link them if there is ORCiD metadata +linking specific authors to a published work. + +This means that author/work is often very incomplete or non-existent. At this +time we would recommend using other services like dblp.org or openalex.org for +more complete (but possibly less accurate) author/work metadata. + + +## Resolving Publication Disputes + +Authorship and publication ethics disputes should generally be resolved with +the original publisher first, then updated in fatcat. diff --git a/guide/src/publishers.md b/guide/src/publishers.md new file mode 100644 index 00000000..1d567afa --- /dev/null +++ b/guide/src/publishers.md @@ -0,0 +1,100 @@ +For Publishers +=================== + +This page addresses common questions and concerns from publishers of research +works indexed in Fatcat, as well as the Internet Archive Scholar service built +on top of it. The [for authors](./authors.md) has some information on updates +and metadata corrections that are also relevant to publishers. + +For help in exceptional cases, contact Internet Archive through our usual +support channels. + + +## Metadata Indexing + +Many publishers will find that metadata records are already included in fatcat +if they register persistent identifiers for their research works. This pipeline +is based on our automated harvesting of DOI, Pubmed, dblp, DOAJ, and other +metadata catalogs. This process can take some time (eg, days from +registration), does not (yet) cover all persistent identifiers, and will only +cover those works which get identifiers. + +For publishers who find that they are not getting indexed in fatcat, our +primary advice is to register ISSNs for venues (journals, repositories, +conferences, etc), and to register DOIs for all current and back-catalog works. +DOIs are the most common and integrated identifier in the scholarly ecosystem, +and will result in automatic indexing in many other aggregators in addition to +fatcat/scholar. There may be funding or resources available for smaller +publishers to cover the cost of DOI registration, and ISSN registration is +usually no-cost or affordable through national institutions. + +We *do not* recommend that journal or conference publishers use general-purpose +repositories like Zenodo to obtain no-cost DOIs for journal articles. These +platforms are a great place for pre-publication versions, datasets, software, +and other artifacts, but not for primary publication-version works (in our +opinion). + +If DOI registration is not possible, one good alternative is to get included in +the Directory of Open Access Journals and deposit article metadata there. This +process may take some time, but is a good basic indicator of publication +quality. DOAJ article metadata is periodically harvested and indexed in fatcat, +after a de-duplication process. + +Fatcat does not yet support OAI-PMH as an identifier and mechanism for +automated journal ingest, but we likely will in the future. This would +particularly help publishers using the Open Journal System (OJS). Fatcat also +does not yet support crawling journal sites and extracting bibliographic +metadata from HTML tags. + +Lastly, publishers could use the fatcat catalog web interface or API to push +metadata records about their works programmatically. We don't know of any +publishers actually doing this today. + + +## Improving Automatic Preservation + +In alignment with it's mission, Internet Archive makes basic automated attempts +to capture and preserve all open access research publications on the public +web, at no cost. This effort comes with no guarantees around completeness, +timeliness, or support communications. + +Preservation coverage can be monitored through the journal-specific dashboards +or via the coverage search interface. + +There are a few technical things publishers can do to increase their +preservation coverage, in addition to the metadata indexing tips above: + +- use the `citation_pdf_url` HTML meta tag, when appropriate, to link directly + from article landing pages to PDF URLs +- use simple HTML to represent landing pages and article content, and do not + require Javascript to render page content or links +- ensure that hosting server `robots.txt` rules are not preventing or overly + restricting automated crawling +- use simple, accessible PDF access links. Do not use time-limited or + IP-limited URLs, require specific referrer headers, or use cookies to + authenticate access to OA PDFs +- minimize the number of HTTP redirects and HTML hops between DOI and fulltext + content +- paywalls, loginwalls, geofencing, and anti-bot measures are all obviously + antithetical to open crawling and indexing + +Publishers are also free to submit "Save Paper Now" requests, or edit the +catalog itself either manually or in bulk through the API. If an individual +work persistently fails to ingest, try running a "Save Page Now" request first +from web.archive.org and verify that the content is available through Wayback +replay, then submit the "Save Paper Now" request again. + + +## Official Preservation + +Internet Archive is developing preservation services for scholarly content on +the web. Contact us at webservices@archive.org for details. + +Existing web archiving services offered to universities, national libraries, +and other institutions may already be appropriate for some publications. Check +if your affiliated institutions already have an +[Archive-IT](https://archive-it.org) account or other existing relationship +with Internet Archive. + +Small publishers using Open Journal System (OJS) should be aware of the PKP +preservation project. -- cgit v1.2.3