From b6d228b7171252c8f9f70194c09aba0ed0c55567 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 10 Nov 2021 12:33:36 -0800 Subject: update crawlability docs --- proposals/2021-04-02_crawlability.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'proposals') diff --git a/proposals/2021-04-02_crawlability.md b/proposals/2021-04-02_crawlability.md index 6b9ef66c..ee9f3c5b 100644 --- a/proposals/2021-04-02_crawlability.md +++ b/proposals/2021-04-02_crawlability.md @@ -1,9 +1,16 @@ -status: wip +status: not-implemented Crawlability Improvements -------------------------- +NOTE: After some back and forth on this topic, we have decided for now to focus +on having scholar.archive.org indexed, not fatcat.wiki. This proposal document +document is being kept as documentation of that decision. + + +## Original Intro + We are interested in making the fatcat corpus more crawlable/indexable by aggregators and academic search enginges. For example, CiteseerX, Google Scholar, or Microsoft Academic (when themselves get used by other projects). @@ -13,6 +20,7 @@ Some open questions: - is the web.archive.org iframe for PDFs ok, or should we redirect to PDFs with `id_` in the datetime? + ## Redirect URLs and `citation_pdf_url` We suspect that some crawlers do not like that fatcat.wiki landing pages have -- cgit v1.2.3