diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-04-29 10:03:47 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-04-29 10:03:47 -0700 |
commit | 6b1f87c12f7d40a3016910b214579a368c747df4 (patch) | |
tree | d52e555663a3fe395fe0024098735adaf8e10494 /extra/sitemap/README.md | |
parent | 4b152e02d1a0d0d7a9a391ed211ecd6f304d6962 (diff) | |
download | fatcat-scholar-6b1f87c12f7d40a3016910b214579a368c747df4.tar.gz fatcat-scholar-6b1f87c12f7d40a3016910b214579a368c747df4.zip |
sitemap generation
Diffstat (limited to 'extra/sitemap/README.md')
-rw-r--r-- | extra/sitemap/README.md | 21 |
1 files changed, 21 insertions, 0 deletions
diff --git a/extra/sitemap/README.md b/extra/sitemap/README.md new file mode 100644 index 0000000..242378a --- /dev/null +++ b/extra/sitemap/README.md @@ -0,0 +1,21 @@ + +## HOWTO: Update + +Requires [fatcat-cli](https://gitlab.com/bnewbold/fatcat-cli) and `jq` +installed. Run these commands on a production machine. + + cd /srv/fatcat_scholar/sitemap + export DATE=`date --iso-8601` + /srv/fatcat_scholar/src/extra/sitemap/work_urls_query.sh $DATE + rm *.txt.gz + /srv/fatcat/src/extra/sitemap/generate_sitemap_indices.py + +## Background + +Google has a limit of 50k lines / 10 MByte for text sitemap files, and 50K +lines / 50 MByte for XML site map files. Google Scholar has indicated a smaller +20k URL / 5 MB limit. + +## Resources + +Google sitemap verifier: https://support.google.com/webmasters/answer/7451001 |