From 6b1f87c12f7d40a3016910b214579a368c747df4 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 29 Apr 2021 10:03:47 -0700 Subject: sitemap generation --- extra/sitemap/README.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 extra/sitemap/README.md (limited to 'extra/sitemap/README.md') diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md new file mode 100644 index 0000000..242378a --- /dev/null +++ b/extra/sitemap/README.md @@ -0,0 +1,21 @@ + +## HOWTO: Update + +Requires [fatcat-cli](https://gitlab.com/bnewbold/fatcat-cli) and `jq` +installed. Run these commands on a production machine. + + cd /srv/fatcat_scholar/sitemap + export DATE=`date --iso-8601` + /srv/fatcat_scholar/src/extra/sitemap/work_urls_query.sh $DATE + rm *.txt.gz + /srv/fatcat/src/extra/sitemap/generate_sitemap_indices.py + +## Background + +Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K +lines / 50 MByte for XML site map files. Google Scholar has indicated a smaller +20k URL / 5 MB limit. + +## Resources + +Google sitemap verifier: https://support.google.com/webmasters/answer/7451001 -- cgit v1.2.3