diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-10 12:33:36 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2021-11-10 17:08:07 -0800 |
commit | b6d228b7171252c8f9f70194c09aba0ed0c55567 (patch) | |
tree | 2e3e73b531b29858556ed3b51d3034d99288c212 /proposals | |
parent | 0a36276cc201ca7d4b3d2f491648c71255de21e3 (diff) | |
download | fatcat-b6d228b7171252c8f9f70194c09aba0ed0c55567.tar.gz fatcat-b6d228b7171252c8f9f70194c09aba0ed0c55567.zip |
update crawlability docs
Diffstat (limited to 'proposals')
-rw-r--r-- | proposals/2021-04-02_crawlability.md | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/proposals/2021-04-02_crawlability.md b/proposals/2021-04-02_crawlability.md index 6b9ef66c..ee9f3c5b 100644 --- a/proposals/2021-04-02_crawlability.md +++ b/proposals/2021-04-02_crawlability.md @@ -1,9 +1,16 @@ -status: wip +status: not-implemented Crawlability Improvements -------------------------- +NOTE: After some back and forth on this topic, we have decided for now to focus +on having scholar.archive.org indexed, not fatcat.wiki. This proposal document +document is being kept as documentation of that decision. + + +## Original Intro + We are interested in making the fatcat corpus more crawlable/indexable by aggregators and academic search enginges. For example, CiteseerX, Google Scholar, or Microsoft Academic (when themselves get used by other projects). @@ -13,6 +20,7 @@ Some open questions: - is the web.archive.org iframe for PDFs ok, or should we redirect to PDFs with `id_` in the datetime? + ## Redirect URLs and `citation_pdf_url` We suspect that some crawlers do not like that fatcat.wiki landing pages have |