diff options
Diffstat (limited to 'extra/sitemap/README.md')
-rw-r--r-- | extra/sitemap/README.md | 21 |
1 files changed, 21 insertions, 0 deletions
diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md new file mode 100644 index 0000000..242378a --- /dev/null +++ b/extra/sitemap/README.md @@ -0,0 +1,21 @@ + +## HOWTO: Update + +Requires [fatcat-cli](https://gitlab.com/bnewbold/fatcat-cli) and `jq` +installed. Run these commands on a production machine. + + cd /srv/fatcat_scholar/sitemap + export DATE=`date --iso-8601` + /srv/fatcat_scholar/src/extra/sitemap/work_urls_query.sh $DATE + rm *.txt.gz + /srv/fatcat/src/extra/sitemap/generate_sitemap_indices.py + +## Background + +Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K +lines / 50 MByte for XML site map files. Google Scholar has indicated a smaller +20k URL / 5 MB limit. + +## Resources + +Google sitemap verifier: https://support.google.com/webmasters/answer/7451001 |