# Hello friends! # If you are considering large or automated crawling, you may want to look at # our catalog API (https://api.fatcat.wiki) or bulk database snapshots instead. # by default, can crawl anything on this domain. HTTP 429 ("backoff") status # codes are used for rate-limiting instead of any crawl delay specified here. # Up to a handful concurrent requests should be fine. User-Agent: * Allow: / # crawling search result pages is expensive, so we do specify a long crawl delay for those User-agent: * Allow: /search Crawl-delay: 5 Sitemap: https://scholar.archive.org/sitemap.xml Sitemap: https://scholar.archive.org/sitemap-index-works.xml # same info as sitemap-index-works.xml plus following citation_pdf_url #Sitemap: https://scholar.archive.org/sitemap-index-access.xml